Improving Performance with active_tolΒΆ

Note

This is an experimental tweak to performance. Using it requires a high level of understanding of optimization and the structure of your problem.

Some optimizers use an active set method, whereby their constraints are marked as active or inactive depending on their proximity to the feasible region. If a constraint is far enough from the feasible region, it is essentially redundant to the set, so an optimizer can ignore it and mark it as inactive. When this occurs, it no longer needs functional or gradient evaluations for that constraint.

Since gradient calculation can be a major source of computation, some performance can be gained if we can omit calculating the derivatives for inactive constraints. The ideal way to do this would be to gain access to the optimizer internals and promote that information to OpenMDAO. This was not conveniently available, so we have instead provided an argument to add_constraint that lets you specify how far from your constraint boundary you need to go before you consider it to be inactive. This is most easily used on geometric problems where you can clearly visualize when a constraint is completely occluded by other constraints.

The following restrictions apply to using the active tolerance.

  • Optimizer must support active set methods (only SNOPT in pyoptsparse at present)
  • Only works for adjoint mode, so mode in root linear solver must be set to “rev”
  • Relevance reduction must be enabled (“single_voi_relevance_reduction” set to True in root linear solver)

Let’s consider a problem where we have 7 discs with a 1 cm diameter, and we would like to arrange them on a line as closely together as possible without overlapping. We can do this by minimizing the sum of the distances between each disc and its 6 neighbors. Now, we don’t want any of our discs to overlap, so we need to constrain each of them so that the distance to every other disc is greater than 1 diameter.

The code for this is below. We used an ExecComp because the equation for distance is simple to write.

To make a point about derivative calculation, we use OpenMDAO’s built-in profiling. This time, when we set up the profiler, we tell it to only count every time apply_linear (which is the workhorse derivatives function) is called on Component. The output will be placed in a file that we can process later.

from __future__ import print_function
from six.moves import range

import numpy as np

from openmdao.api import Problem, Group, pyOptSparseDriver, ExecComp, IndepVarComp

if __name__ == '__main__':

    # So we compare the same starting locations.
    np.random.seed(123)

    diam = 1.0
    pin = 15.0
    n_disc = 7

    prob = Problem()
    prob.root = root = Group()

    driver = prob.driver = pyOptSparseDriver()
    driver.options['optimizer'] = 'SNOPT'
    driver.options['print_results'] = False

    # Note, active tolerance requires relevance reduction to work.
    root.ln_solver.options['single_voi_relevance_reduction'] = True

    # Also, need to be in adjoint
    root.ln_solver.options['mode'] = 'rev'

    obj_expr = 'obj = '
    sep = ''
    for i in range(n_disc):

        dist = "dist_%d" % i
        x1var = 'x_%d' % i

        # First disc is pinned
        if i == 0:
            root.add('p_%d' % i, IndepVarComp(x1var, pin), promotes=(x1var, ))

        # The rest are design variables for the optimizer.
        else:
            init_val = 5.0*np.random.random() - 5.0 + pin
            root.add('p_%d' % i, IndepVarComp(x1var, init_val), promotes=(x1var, ))
            driver.add_desvar(x1var)

        for j in range(i):

            x2var = 'x_%d' % j
            yvar = 'y_%d_%d' % (i, j)
            name = dist + "_%d" % j
            expr = '%s= (%s - %s)**2' % (yvar, x1var, x2var)
            root.add(name, ExecComp(expr), promotes = (x1var, x2var, yvar))

            # Constraint (you can experiment with turning on/off the active_tol)
            #driver.add_constraint(yvar, lower=diam)
            driver.add_constraint(yvar, lower=diam, active_tol=diam*2.0)

            # This pair's contribution to objective
            obj_expr += sep + yvar
            sep = ' + '

    root.add('sum_dist', ExecComp(obj_expr), promotes=('*', ))
    driver.add_objective('obj')

    prob.setup()

    print("Initial Locations")
    for i in range(n_disc):
        xvar = 'x_%d' % i
        print(prob[xvar])


    # Run with profiling turned on so that we can count the total derivative
    # component calls.
    from openmdao.api import profile, Component
    profile.setup(prob, methods={'apply_linear' : (Component, )})
    profile.start()
    prob.run()
    profile.stop()

    print("\nFinal Locations")
    for i in range(n_disc):
        xvar = 'x_%d' % i
        print(prob[xvar])

    total_apply = 0
    for syst in root.subsystems(recurse=True):
        if 'dist_' in syst.name:
            total_apply += syst.total_calls
    print("\ntotal apply_linear calls:", total_apply)

Note that we defined the variable “n_disc” for the number of discs, so component and variable names such as “dist_1_2” and “y_2” had to be created with some string operations.

Initial Locations
15.0
13.482345928
11.4306966748
11.1342572678
12.7565738454
13.5973448489
12.1155323006

Final Locations
15.0
12.9999999413
9.99999993369
8.99999991376
11.9999999405
13.9999999687
10.9999999358

Note that this lines our discs up neatly so that they are touching each other with their centers ranging from 9 to 15. Note that we chose a distance of 2.0 times the disc diameter as our “active_tol”. When we do this, and have 3 discs in a row, then the distance constraint between disc1 and disc3 is inactive, so its gradient is not calculated.

We can look at the profiling output by issuing the following command at your operating system command line:

proftotals prof_raw.0

We want the grand totals, which are printed last:

Grand Totals
-------------
Function Name, Total Time, Calls
apply_linear, 0.00373601913452, 183

So, did our active tolerance really do anything? If we turn it off and compare the number of apply_linear calls by running and postprocessing a second time:

Initial Locations
15.0
13.482345928
11.4306966748
11.1342572678
12.7565738454
13.5973448489
12.1155323006

Final Locations
15.0
12.9999998135
9.99999980586
8.99999978593
11.9999998126
13.9999999687
10.999999808

The optimium is essentially the same. The time:

Grand Totals
-------------
Function Name, Total Time, Calls
apply_linear, 0.00686693191528, 344

So almost half of the apply_linear calls turn out to be unneeded.

This would normally be a pretty bad case to run in adjoint mode because the number of constraints varies with n_disc by (n_disc**2)/2 - n_disc, while the number of design variables only varies by n_disc. However, a good choice for “active_tol” cuts out a significant number of the extra gradient calculations.