Development Guide

Note

This Development Guide page is still actively updated. We wish to make adding new black-box optimizers as easy as possible. Considering the relatively long runtime of black-box optimizers on high-dimensional problems, at least two core developers of this library will check the source code and run the testing code manually when any new black-box optimizer is added, in order to check its programming correctness.

Before reading this page, it is required to first read User Guide for some basic information about this open-source Python library PyPop7. Note that since this topic is mainly for advanced developers, the end-users can skip this page freely.

Docstring Conventions

For docstring conventions, first PEP 257 is used in this library. Since this library is built on the NumPy ecosystem, we further use the docstring conventions from numpydoc.

Furthermore, now PEP 465 is used as a dedicated infix operator for matrix multiplication. We are modifying all existing Python code to simplify them under PEP 465.

Library Dependencies

This open-source Python library depends heavily on three core scientific-computing Python libraries, i.e., NumPy, SciPy, and Scikit-Learn. More specifically, for all optimizers the numpy.array data structure is chosen as the basic way to store and operate the population (e.g., sampling, updating, indexing, and sorting), which leads to significant speedup. Sometimes Numba is utilized to further accelerate the wall-clock time for large-scale black-box optimization, if possible. An obvious advantage of using NumPy as the core computing engine is that Pypop7 can be seamlessly integrated into the NumPy ecosystem, given the fact that SciPy covers a limited number of population-based BBOs till now.

For the PyPI installation of this Python library, setup.cfg is used,
For the development of this Python library, requirements.txt is used.

A Unified API

For PyPop7, we use the popular Object-Oriented Programming (OOP) paradigm to structure all optimizers, which can provide consistency, flexibility, and simplicity. We did not adopt another popular Procedure-Oriented Programming paradigm. However, in the future versions, we may provide such an interface only at the end-user level (rather than the developer level).

For all optimizers, the abstract class called Optimizer needs to be inherited, in order to provide a unified API.

All members shared by all optimizers (e.g., fitness_function, ndim_problem, etc.) should be defined in the __init__ method of this class.
All methods public to end-users should be defined in this class except special cases.
All settings related to fair benchmarking comparisons (e.g., max_function_evaluations, max_runtime, and fitness_threshold) should be defined in the __init__ method of this class.

Initialization of Optimizer Options

For initialization of optimizer options, the following function __init__ of Optimizer should be inherited:

def __init__(self, problem, options):
    # here all members will be inherited by any subclass of `Optimizer`

All exclusive members of each subclass will be defined after inheriting the above function of Optimizer.

Initialization of Population

We separate the initialization of optimizer options with that of population (a set of individuals), in order to obtain better flexibility. To achieve this, the following function initialize should be modified:

def initialize(self):  # for population initialization
    raise NotImplementedError  # need to be implemented in any subclass of `Optimizer`

Its another goal is to minimize the number of class members, to make it easy to set for end-users, but at a slight cost of more variables control for developers.

Computation of Each Generation

Update each one generation (iteration) via modifying the following function iterate:

def iterate(self):  # for one generation (iteration)
    raise NotImplementedError  # need to be implemented in any subclass of `Optimizer`

Control of Entire Optimization Process

Control the entire search process via modifying the following function optimize:

def optimize(self, fitness_function=None):  # entire optimization process
    return None  # `None` should be replaced in any subclass of `Optimizer`

Typically, common auxiliary tasks (e.g., printing verbose information, restarting) are conducted inside this function.

Using Pure Random Search as an Illustrative Example

In the following Python code, we use Pure Random Search (PRS), perhaps the simplest black-box optimizer, as an illustrative example.

import numpy as np

from pypop7.optimizers.core.optimizer import Optimizer  # base class of all black-box optimizers


class PRS(Optimizer):
    """Pure Random Search (PRS).

    .. note:: `PRS` is one of the *simplest* and *earliest* black-box optimizers, dating back to at least
       `1950s <https://pubsonline.informs.org/doi/abs/10.1287/opre.6.2.244>`_.
       Here we include it mainly for *benchmarking* purpose. As pointed out in `Probabilistic Machine Learning
       <https://probml.github.io/pml-book/book2.html>`_, *this should always be tried as a baseline*.

    Parameters
    ----------
    problem : dict
              problem arguments with the following common settings (`keys`):
                * 'fitness_function' - objective function to be **minimized** (`func`),
                * 'ndim_problem'     - number of dimensionality (`int`),
                * 'upper_boundary'   - upper boundary of search range (`array_like`),
                * 'lower_boundary'   - lower boundary of search range (`array_like`).
    options : dict
              optimizer options with the following common settings (`keys`):
                * 'max_function_evaluations' - maximum of function evaluations (`int`, default: `np.inf`),
                * 'max_runtime'              - maximal runtime to be allowed (`float`, default: `np.inf`),
                * 'seed_rng'                 - seed for random number generation needed to be *explicitly* set (`int`);
              and with the following particular setting (`key`):
                * 'x' - initial (starting) point (`array_like`).

    Attributes
    ----------
    x     : `array_like`
            initial (starting) point.

    Examples
    --------
    Use the `PRS` optimizer to minimize the well-known test function
    `Rosenbrock <http://en.wikipedia.org/wiki/Rosenbrock_function>`_:

    .. code-block:: python
       :linenos:

       >>> import numpy
       >>> from pypop7.benchmarks.base_functions import rosenbrock  # function to be minimized
       >>> from pypop7.optimizers.rs.prs import PRS
       >>> problem = {'fitness_function': rosenbrock,  # define problem arguments
       ...            'ndim_problem': 2,
       ...            'lower_boundary': -5.0*numpy.ones((2,)),
       ...            'upper_boundary': 5.0*numpy.ones((2,))}
       >>> options = {'max_function_evaluations': 5000,  # set optimizer options
       ...            'seed_rng': 2022}
       >>> prs = PRS(problem, options)  # initialize the optimizer class
       >>> results = prs.optimize()  # run the optimization process
       >>> print(results)

    For its correctness checking of coding, refer to `this code-based repeatability report
    <https://tinyurl.com/mrx2kffy>`_ for more details.

    References
    ----------
    Bergstra, J. and Bengio, Y., 2012.
    Random search for hyper-parameter optimization.
    Journal of Machine Learning Research, 13(2).
    https://www.jmlr.org/papers/v13/bergstra12a.html

    Schmidhuber, J., Hochreiter, S. and Bengio, Y., 2001.
    Evaluating benchmark problems by random guessing.
    A Field Guide to Dynamical Recurrent Networks, pp.231-235.
    https://ml.jku.at/publications/older/ch9.pdf

    Brooks, S.H., 1958.
    A discussion of random methods for seeking maxima.
    Operations Research, 6(2), pp.244-251.
    https://pubsonline.informs.org/doi/abs/10.1287/opre.6.2.244
    """
    def __init__(self, problem, options):
        """Initialize the class with two inputs (problem arguments and optimizer options)."""
        Optimizer.__init__(self, problem, options)
        self.x = options.get('x')  # initial (starting) point
        self.verbose = options.get('verbose', 1000)
        self._n_generations = 0  # number of generations

    def _sample(self, rng):
        x = rng.uniform(self.initial_lower_boundary, self.initial_upper_boundary)
        return x

    def initialize(self):
        """Only for the initialization stage."""
        if self.x is None:
            x = self._sample(self.rng_initialization)
        else:
            x = np.copy(self.x)
        assert len(x) == self.ndim_problem
        return x

    def iterate(self):
        """Only for the iteration stage."""
        return self._sample(self.rng_optimization)

    def _print_verbose_info(self, fitness, y):
        """Save fitness and control console verbose information."""
        if self.saving_fitness:
            if not np.isscalar(y):
                fitness.extend(y)
            else:
                fitness.append(y)
        if self.verbose and ((not self._n_generations % self.verbose) or (self.termination_signal > 0)):
            info = '  * Generation {:d}: best_so_far_y {:7.5e}, min(y) {:7.5e} & Evaluations {:d}'
            print(info.format(self._n_generations, self.best_so_far_y, np.min(y), self.n_function_evaluations))

    def _collect(self, fitness, y=None):
        """Collect necessary output information."""
        if y is not None:
            self._print_verbose_info(fitness, y)
        results = Optimizer._collect(self, fitness)
        results['_n_generations'] = self._n_generations
        return results

    def optimize(self, fitness_function=None, args=None):  # for all iterations (generations)
        """For the entire optimization/evolution stage: initialization + iteration."""
        fitness = Optimizer.optimize(self, fitness_function)
        x = self.initialize()  # population initialization
        y = self._evaluate_fitness(x, args)  # to evaluate fitness of starting point
        while not self._check_terminations():
            self._print_verbose_info(fitness, y)  # to save fitness and control console verbose information
            x = self.iterate()
            y = self._evaluate_fitness(x, args)  # to evaluate each new point
            self._n_generations += 1
        results = self._collect(fitness, y)  # to collect all necessary output information
        return results

We have decided to adopt the active development/maintenance mode, that is, once new black-box optimizers are added or serious bugs are fixed, we will release a new PyPI version soon.

Repeatability Code/Reports

Optimizer	Repeatability Code	Generated Figure(s)/Data
MMES	_repeat_mmes.py	figures
FCMAES	_repear_fcmaes.py	figures
LMMAES	_repeat_lmmaes.py	figures
LMCMA	_repeat_lmcma.py	figures
LMCMAES	_repeat_lmcmaes.py	data
RMES	_repeat_rmes.py	figures
R1ES	_repeat_r1es.py	figures
VKDCMA	_repeat_vkdcma.py	data
VDCMA	_repeat_vdcma.py	data
CCMAES2016	_repeat_ccmaes2016.py	figures
OPOA2015	_repeat_opoa2015.py	figures
OPOA2010	_repeat_opoa2010.py	figures
CCMAES2009	_repeat_ccmaes2009.py	figures
OPOC2009	_repeat_opoc2009.py	figures
OPOC2006	_repeat_opoc2006.py	figures
SEPCMAES	_repeat_sepcmaes.py	data
DDCMA	_repeat_ddcma.py	data
MAES	_repeat_maes.py	figures
FMAES	_repeat_fmaes.py	figures
CMAES	_repeat_cmaes.py	data
SAMAES	_repeat_samaes.py	figures
SAES	_repeat_saes.py	data
CSAES	_repeat_csaes.py	figures
DSAES	_repeat_dsaes.py	figures
SSAES	_repeat_ssaes.py	figures
RES	_repeat_res.py	figures
R1NES	_repeat_r1nes.py	data
SNES	_repeat_snes.py	data
XNES	_repeat_xnes.py	data
ENES	_repeat_enes.py	data
ONES	_repeat_ones.py	data
SGES	_repeat_sges.py	data
RPEDA	_repeat_rpeda.py	data
UMDA	_repeat_umda.py	data
AEMNA	_repeat_aemna.py	data
EMNA	_repeat_emna.py	data
DCEM	_repeat_dcem.py	data
DSCEM	_repeat_dscem.py	data
MRAS	_repeat_mras.py	data
SCEM	_repeat_scem.py	data
SHADE	_repeat_shade.py	data
JADE	_repeat_jade.py	data
CODE	_repeat_code.py	data
TDE	_repeat_tde.py	figures
CDE	_repeat_cde.py	data
CCPSO2	_repeat_ccpso2.py	data
IPSO	_repeat_ipso.py	data
CLPSO	_repeat_clpso.py	data
CPSO	_repeat_cpso.py	data
SPSOL	_repeat_spsol.py	data
SPSO	_repeat_spso.py	data
HCC	N/A	N/A
COCMA	N/A	N/A
COEA	_repeat_coea.py	figures
COSYNE	_repeat_cosyne.py	data
ESA	_repeat_esa.py	data
CSA	_repeat_csa.py	data
NSA	N/A	N/A
ASGA	_repeat_asga.py	data
GL25	_repeat_gl25.py	data
G3PCX	_repeat_g3pcx.py	figures
GENITOR	N/A	N/A
LEP	_repeat_lep.py	data
FEP	_repeat_fep.py	data
CEP	_repeat_cep.py	data
POWELL	_repeat_powell.py	data
GPS	N/A	N/A
NM	_repeat_nm.py	data
HJ	_repeat_hj.py	data
CS	N/A	N/A
BES	_repeat_bes.py	figures
GS	_repeat_gs.py	figures
SRS	N/A	N/A
ARHC	_repeat_arhc.py	data
RHC	_repeat_rhc.py	data
PRS	_repeat_prs.py	figures

Python IDE for Development

Although other Python IDEs (e.g., Spyder, Visual Studio) are possible to use for development, currently we mainly use the PyCharm Community Edition and Anaconda to develop our open-source library. We thank very much for jetbrains and anaconda providing these two free development tools. Note that we do NOT exclude any other choices for development.