Development Guide

Note

This Development Guide page is still actively updated. We wish to make adding new black-box optimizers as easy as possible. Considering the relatively long runtime of black-box optimizers on high-dimensional problems, at least two core developers of this library will check the source code and run the testing code manually when any new optimizer is added or the existing optimizer is significantly modified, in order to check its correctness.

Before reading this page, it is required to first read User Guide for some basic information about this open-source Python library PyPop7. Note that since this topic is mainly for advanced developers, the end-users can skip this page freely.

Docstring Conventions

For docstring conventions, first PEP 257 is used in this library. Since this library is built on the NumPy ecosystem, we further use the docstring conventions from numpydoc.

Furthermore, now PEP 465 is used as a dedicated infix operator for matrix multiplication. We are modifying all existing Python code to simplify them under PEP 465.

Library Dependencies

This open-source library depends heavily on three core scientific computing (open-source) libraries, i.e., NumPy, SciPy, and Scikit-Learn. More specifically, for all optimizers the numpy.array data structure is chosen as the basic way to store and operate the population (e.g., sampling, updating, indexing, and sorting), which leads to significant speedup. Sometimes Numba is utilized to further accelerate the wall-clock time for large-scale black-box optimization, if possible. An obvious advantage of using NumPy as the core computing engine is that Pypop7 can be seamlessly integrated into the NumPy ecosystem, given the fact that SciPy covers a limited number of population-based BBOs till now.

A Unified API

For PyPop7, we use the popular Object-Oriented Programming (OOP) paradigm to structure all optimizers, which can provide consistency, flexibility, and simplicity. We did not adopt another popular Procedure-Oriented Programming paradigm. However, in the future versions, we may provide such an interface only at the end-user level (rather than the developer level).

For all optimizers, the abstract class called Optimizer needs to be inherited, in order to provide a unified API.

  • All members shared by all optimizers (e.g., fitness_function, ndim_problem, etc.) should be defined in the __init__ method of this class.

  • All methods public to end-users should be defined in this class except special cases.

  • All settings related to fair benchmarking comparisons (e.g., max_function_evaluations, max_runtime, and fitness_threshold) should be defined in the __init__ method of this class.

Initialization of Optimizer Options

For initialization of optimizer options, the following function __init__ of Optimizer should be inherited:

def __init__(self, problem, options):
    # here all members will be inherited by any subclass of `Optimizer`

All exclusive members of each subclass will be defined after inheriting the above function of Optimizer.

Initialization of Population

We separate the initialization of optimizer options with that of population (a set of individuals), in order to obtain better flexibility. To achieve this, the following function initialize should be modified:

def initialize(self):  # for population initialization
    raise NotImplementedError  # need to be implemented in any subclass of `Optimizer`

Its another goal is to minimize the number of class members, to make it easy to set for end-users, but at a slight cost of more variables control for developers.

Computation of Each Generation

Update each one generation (iteration) via modifying the following function iterate:

def iterate(self):  # for one generation (iteration)
    raise NotImplementedError  # need to be implemented in any subclass of `Optimizer`

Control of Entire Optimization Process

Control the entire search process via modifying the following function optimize:

def optimize(self, fitness_function=None):  # entire optimization process
    return None  # `None` should be replaced in any subclass of `Optimizer`

Typically, common auxiliary tasks (e.g., printing verbose information, restarting) are conducted inside this function.

Using Pure Random Search as an Illustrative Example

In the following Python code, we use Pure Random Search (PRS), perhaps the simplest black-box optimizer, as an illustrative example.

import numpy as np

from pypop7.optimizers.core.optimizer import Optimizer  # base class of all black-box optimizers


class PRS(Optimizer):
    """Pure Random Search (PRS).

    .. note:: `PRS` is one of the *simplest* and *earliest* black-box optimizers, dating back to at least
       `1950s <https://pubsonline.informs.org/doi/abs/10.1287/opre.6.2.244>`_.
       Here we include it mainly for *benchmarking* purpose. As pointed out in `Probabilistic Machine Learning
       <https://probml.github.io/pml-book/book2.html>`_, *this should always be tried as a baseline*.

    Parameters
    ----------
    problem : dict
              problem arguments with the following common settings (`keys`):
                * 'fitness_function' - objective function to be **minimized** (`func`),
                * 'ndim_problem'     - number of dimensionality (`int`),
                * 'upper_boundary'   - upper boundary of search range (`array_like`),
                * 'lower_boundary'   - lower boundary of search range (`array_like`).
    options : dict
              optimizer options with the following common settings (`keys`):
                * 'max_function_evaluations' - maximum of function evaluations (`int`, default: `np.Inf`),
                * 'max_runtime'              - maximal runtime to be allowed (`float`, default: `np.Inf`),
                * 'seed_rng'                 - seed for random number generation needed to be *explicitly* set (`int`);
              and with the following particular setting (`key`):
                * 'x' - initial (starting) point (`array_like`).

    Attributes
    ----------
    x     : `array_like`
            initial (starting) point.

    Examples
    --------
    Use the `PRS` optimizer to minimize the well-known test function
    `Rosenbrock <http://en.wikipedia.org/wiki/Rosenbrock_function>`_:

    .. code-block:: python
       :linenos:

       >>> import numpy
       >>> from pypop7.benchmarks.base_functions import rosenbrock  # function to be minimized
       >>> from pypop7.optimizers.rs.prs import PRS
       >>> problem = {'fitness_function': rosenbrock,  # define problem arguments
       ...            'ndim_problem': 2,
       ...            'lower_boundary': -5.0*numpy.ones((2,)),
       ...            'upper_boundary': 5.0*numpy.ones((2,))}
       >>> options = {'max_function_evaluations': 5000,  # set optimizer options
       ...            'seed_rng': 2022}
       >>> prs = PRS(problem, options)  # initialize the optimizer class
       >>> results = prs.optimize()  # run the optimization process
       >>> print(results)

    For its correctness checking of coding, refer to `this code-based repeatability report
    <https://tinyurl.com/mrx2kffy>`_ for more details.

    References
    ----------
    Bergstra, J. and Bengio, Y., 2012.
    Random search for hyper-parameter optimization.
    Journal of Machine Learning Research, 13(2).
    https://www.jmlr.org/papers/v13/bergstra12a.html

    Schmidhuber, J., Hochreiter, S. and Bengio, Y., 2001.
    Evaluating benchmark problems by random guessing.
    A Field Guide to Dynamical Recurrent Networks, pp.231-235.
    https://ml.jku.at/publications/older/ch9.pdf

    Brooks, S.H., 1958.
    A discussion of random methods for seeking maxima.
    Operations Research, 6(2), pp.244-251.
    https://pubsonline.informs.org/doi/abs/10.1287/opre.6.2.244
    """
    def __init__(self, problem, options):
        """Initialize the class with two inputs (problem arguments and optimizer options)."""
        Optimizer.__init__(self, problem, options)
        self.x = options.get('x')  # initial (starting) point
        self.verbose = options.get('verbose', 1000)
        self._n_generations = 0  # number of generations

    def _sample(self, rng):
        x = rng.uniform(self.initial_lower_boundary, self.initial_upper_boundary)
        return x

    def initialize(self):
        """Only for the initialization stage."""
        if self.x is None:
            x = self._sample(self.rng_initialization)
        else:
            x = np.copy(self.x)
        assert len(x) == self.ndim_problem
        return x

    def iterate(self):
        """Only for the iteration stage."""
        return self._sample(self.rng_optimization)

    def _print_verbose_info(self, fitness, y):
        """Save fitness and control console verbose information."""
        if self.saving_fitness:
            if not np.isscalar(y):
                fitness.extend(y)
            else:
                fitness.append(y)
        if self.verbose and ((not self._n_generations % self.verbose) or (self.termination_signal > 0)):
            info = '  * Generation {:d}: best_so_far_y {:7.5e}, min(y) {:7.5e} & Evaluations {:d}'
            print(info.format(self._n_generations, self.best_so_far_y, np.min(y), self.n_function_evaluations))

    def _collect(self, fitness, y=None):
        """Collect necessary output information."""
        if y is not None:
            self._print_verbose_info(fitness, y)
        results = Optimizer._collect(self, fitness)
        results['_n_generations'] = self._n_generations
        return results

    def optimize(self, fitness_function=None, args=None):  # for all iterations (generations)
        """For the entire optimization/evolution stage: initialization + iteration."""
        fitness = Optimizer.optimize(self, fitness_function)
        x = self.initialize()  # population initialization
        y = self._evaluate_fitness(x, args)  # to evaluate fitness of starting point
        while not self._check_terminations():
            self._print_verbose_info(fitness, y)  # to save fitness and control console verbose information
            x = self.iterate()
            y = self._evaluate_fitness(x, args)  # to evaluate each new point
            self._n_generations += 1
        results = self._collect(fitness, y)  # to collect all necessary output information
        return results

Note that from Oct. 22, 2023, we have decided to adopt the active development/maintenance mode, that is, once new optimizers are added or serious bugs are fixed, we will release a new version right now.

Repeatability Code/Reports

Optimizer

Repeatability Code

Generated Figure(s)/Data

MMES

_repeat_mmes.py

figures

FCMAES

_repear_fcmaes.py

figures

LMMAES

_repeat_lmmaes.py

figures

LMCMA

_repeat_lmcma.py

figures

LMCMAES

_repeat_lmcmaes.py

data

RMES

_repeat_rmes.py

figures

R1ES

_repeat_r1es.py

figures

VKDCMA

_repeat_vkdcma.py

data

VDCMA

_repeat_vdcma.py

data

CCMAES2016

_repeat_ccmaes2016.py

figures

OPOA2015

_repeat_opoa2015.py

figures

OPOA2010

_repeat_opoa2010.py

figures

CCMAES2009

_repeat_ccmaes2009.py

figures

OPOC2009

_repeat_opoc2009.py

figures

OPOC2006

_repeat_opoc2006.py

figures

SEPCMAES

_repeat_sepcmaes.py

data

DDCMA

_repeat_ddcma.py

data

MAES

_repeat_maes.py

figures

FMAES

_repeat_fmaes.py

figures

CMAES

_repeat_cmaes.py

data

SAMAES

_repeat_samaes.py

figures

SAES

_repeat_saes.py

data

CSAES

_repeat_csaes.py

figures

DSAES

_repeat_dsaes.py

figures

SSAES

_repeat_ssaes.py

figures

RES

_repeat_res.py

figures

R1NES

_repeat_r1nes.py

data

SNES

_repeat_snes.py

data

XNES

_repeat_xnes.py

data

ENES

_repeat_enes.py

data

ONES

_repeat_ones.py

data

SGES

_repeat_sges.py

data

RPEDA

_repeat_rpeda.py

data

UMDA

_repeat_umda.py

data

AEMNA

_repeat_aemna.py

data

EMNA

_repeat_emna.py

data

DCEM

_repeat_dcem.py

data

DSCEM

_repeat_dscem.py

data

MRAS

_repeat_mras.py

data

SCEM

_repeat_scem.py

data

SHADE

_repeat_shade.py

data

JADE

_repeat_jade.py

data

CODE

_repeat_code.py

data

TDE

_repeat_tde.py

figures

CDE

_repeat_cde.py

data

CCPSO2

_repeat_ccpso2.py

data

IPSO

_repeat_ipso.py

data

CLPSO

_repeat_clpso.py

data

CPSO

_repeat_cpso.py

data

SPSOL

_repeat_spsol.py

data

SPSO

_repeat_spso.py

data

HCC

N/A

N/A

COCMA

N/A

N/A

COEA

_repeat_coea.py

figures

COSYNE

_repeat_cosyne.py

data

ESA

_repeat_esa.py

data

CSA

_repeat_csa.py

data

NSA

N/A

N/A

ASGA

_repeat_asga.py

data

GL25

_repeat_gl25.py

data

G3PCX

_repeat_g3pcx.py

figures

GENITOR

N/A

N/A

LEP

_repeat_lep.py

data

FEP

_repeat_fep.py

data

CEP

_repeat_cep.py

data

POWELL

_repeat_powell.py

data

GPS

N/A

N/A

NM

_repeat_nm.py

data

HJ

_repeat_hj.py

data

CS

N/A

N/A

BES

_repeat_bes.py

figures

GS

_repeat_gs.py

figures

SRS

N/A

N/A

ARHC

_repeat_arhc.py

data

RHC

_repeat_rhc.py

data

PRS

_repeat_prs.py

figures

Python IDE for Development

Although other Python IDEs (e.g., Spyder, Visual Studio) are possible to use for development, currently we mainly use the PyCharm Community Edition and Anaconda to develop our open-source library. We thank very much for jetbrains and anaconda providing these two free development tools. Note that we do NOT exclude any other choices for development.