DOE Drivers - Running a Design of Experiments (DOE)

Drivers that run a design of experiments (DOE) all inherit from the PredeterminedRunsDriver class. Some examples are FullFactorialDriver, UniformDriver, and the subject of this tutorial, OptimizedLatinHypercubeDriver.

The Latin Hypercube Driver

The Latin Hypercube (LHC) method of randomization generates samples across an entire range of desired design variables. The Optimized Latin Hypercube (OLHC) is an improvement on the LHC method in which samples generated by the LHC are adjusted to ensure an expansive distribution.

Below is an example using the component from the Paraboloid Tutorial - Simple Optimization Problem.

from openmdao.api import IndepVarComp, Group, Problem, ScipyOptimizer, ExecComp, DumpRecorder
from openmdao.test.paraboloid import Paraboloid

from openmdao.drivers.latinhypercube_driver import OptimizedLatinHypercubeDriver

top = Problem()
root = top.root = Group()

root.add('p1', IndepVarComp('x', 50.0), promotes=['x'])
root.add('p2', IndepVarComp('y', 50.0), promotes=['y'])
root.add('comp', Paraboloid(), promotes=['x', 'y', 'f_xy'])

top.driver = OptimizedLatinHypercubeDriver(num_samples=4, seed=0, population=20, generations=4, norm_method=2)
top.driver.add_desvar('x', lower=-50.0, upper=50.0)
top.driver.add_desvar('y', lower=-50.0, upper=50.0)

top.driver.add_objective('f_xy')

recorder = DumpRecorder('paraboloid')
recorder.options['record_params'] = True
recorder.options['record_unknowns'] = False
recorder.options['record_resids'] = False
top.driver.add_recorder(recorder)

top.setup()
top.run()

top.cleanup()

Here we will explain the code statements that pertain to using an Optimized Latin Hypercube (OLHC).

from openmdao.drivers.latinhypercube_driver import OptimizedLatinHypercubeDriver

In order to setup a model to utilize the OLHC, we need to import OptimizedLatinHypercubeDriver.

root.add('p1', IndepVarComp('x', 50.0), promotes=['x'])
root.add('p2', IndepVarComp('y', 50.0), promotes=['y'])
root.add('comp', Paraboloid(), promotes=['x', 'y', 'f_xy'])

By promoting the variables x and y to the group level, no connect statement is needed for the paraboloid component to receive these as input variables. Similarly, when we define the OLHC design variables, we can simply call the same x and y without any additional code.

The next three lines will initialize and add design variables to the OLHC.

top.driver = OptimizedLatinHypercubeDriver(num_samples=4, seed=0, population=20, generations=4, norm_method=2)
top.driver.add_desvar('x', lower=-50.0, upper=50.0)
top.driver.add_desvar('y', lower=-50.0, upper=50.0)

The first line initializes the driver. The ‘num_samples’ argument determines how many samples each design variable will cycle through. By default, the ‘seed’ argument itself is a random value. However, by choosing a seed value, we can easily duplicate a sample set for a repeatable testing environment.

The next two lines define the ranges of the design variables on which the OLHC will operate.

Now we are ready to record data. The four intervals of [-50, 50) are: [-50,-25), [-25,0), [0,25), [25,50). The LHC will ensure there is an x and y value in each interval.

The recorded output is shown below. It’s been filtered to show only the generated input values.

Timestamp: 1446745425.935
Iteration Coordinate: Driver/0
Params:
  comp.x: -34.5655676503
  comp.y: -37.7665306197
Timestamp: 1446745425.936
Iteration Coordinate: Driver/0
Params:
  comp.x: 47.7779475273
  comp.y: 14.7245403548
Timestamp: 1446745425.936
Iteration Coordinate: Driver/0
Params:
  comp.x: -7.77803623101
  comp.y: 34.1373674836
Timestamp: 1446745425.936
Iteration Coordinate: Driver/0
Params:
  comp.x: 1.49637359158
  comp.y: -14.9422886231

As you can see, there is an ‘x’ and ‘y’ value in each interval. The OLHC was able to set up 3 of the 4 input combinations to be in different intervals from each other, ensuring better coverage of the parameter space.

Running a DOE Driver in Parallel

All drivers inheriting from PredeterminedRunsDriver take an initialization argument named num_par_doe. This is used to specify the desired number of cases to be performed concurrently. The default value is 1, but you can set it to a higher value and run your model in parallel.

If you have mpi4py and petsc4py installed, you can run your model using mpirun and your DOE cases will be run in parallel using MPI. To learn how to properly install all of the dependencies needed to run in parallel, see MPI on Linux or MPI on Windows.

If you don’t have mpi4py or petsc4py, your cases will be run concurrently using the multiprocessing library.

When running parallel DOEs, it’s important to be aware of which variables you are saving in your recorders. Parallel DOE cases run in separate processes and the recorder variables have to be transferred back to the master process. By default, recorders record every parameter and unknown, so if you don’t actually need to know every variable value, you can specify which variables you want as follows:

recorder.options['includes'] = ['x', 'y', 'f_xy']

Also, when doing parallel DOEs with multiprocessing, you should avoid using recorders anywhere other than in the top level driver. Recorders in solvers, even at the top level, will not function properly. The reason for this is that when running under multiprocessing, there is only one transfer of data from a worker process back to the master process, and that happens only at the top level after the call to root.solve_nonlinear() completes.

Finally, when using multiprocessing on a Windows machine, your entire model must be picklable, because multiprocessing on Windows uses pickle to create a copy of your model in each new process. On linux and OS X, picking isn’t necessary because fork() is used to duplicate the parent process.