Black-Box Optimization (BBO)

Note

“Certainly, and especially because of the broad availability of difficult and important applications, this promises to be an exciting, interesting, and challenging area for many years to come.”—[Conn et al., 2009, Introduction to Derivative-Free Optimization, MOS-SIAM Series on Optimization]

The black-box nature of many real-world optimization problems comes from one or more of the following factors, as shown in e.g. the classical book Introduction to Derivative-Free Optimization or the seminal paper Random Gradient-Free Minimization of Convex Functions, just to name a few:

increasing complexity in mathematical modeling,
higher sophistication of scientific computing,
an abundance of legacy or proprietary codes (modification is either too costly or impossible),
noisy function evaluations,
memory limitations as fast differentiation needs to all intermediate computations,
expensive working time (very often working time for computing partial derivatives is much more expensive than the computational time),
a non-trivial extension of the gradient notion onto nonsmooth cases,
a simple preparatory stage (the computation program of function values is always much simpler than the program for computing the gradient vector).

Note

“These methods have an evident advantage of a simple preparatory stage (the program of computation of the function value is always much simpler than the program for computing the vector of the gradient).”—[Nesterov&Spokoiny, 2017, FoCM]

Some of common problem characteristics of BBO are presented below:

Unavailability of the gradient information in various black-box settings, such as
- [Moon et al., 2023, Nature Medicine], [Wang et al., 2023, Nature Mental Health], [Xie et al., 2023, Nature Communications], [Mathis et al., 2023, Nature Biotechnology], [Muller et al., 2023, ICML], [Tian et al., 2023, KDD], [Schuch et al., 2023, JAMA], [Cowen-Rivers, 2022, Doctoral Thesis], [Flam-Shepherd et al., 2022, Nature Communications], [Roman et al., 2021, Nature Machine Intelligence], [Beucler et al., 2021, PRL], [Shen et al., 2021, Nature Communications], [Gonatopoulos-Pournatzis et al., 2020, Nature Biotechnology], [Valeri et al., 2020, Nature Communications], and so on from the AutoML community;
- [Chen et al., 2020, Science Robotics], [Schumer and Steiglitz, 1968, TAC], [Karnopp, 1963, Automatica], [Ashby, 1952] from the Adaptive Control community;
- [Brooks, 1959, OR], [Brooks, 1958, OR] from the Operations Research (OR) community;
without a precise mathematical model (e.g., owing to complex simulation), such as
- [Pickard&Needs, 2006, PRL],
- [Robbins, 1952, BAMS].
non-differentiability, such as
- when automatic differentiation is not possible (gives noninformative gradients) [Li et al., 2023, NeurIPS];
non-linearity, such as
- nonlinear metamaterials.
multi-modality/non-convexity, such as
- variational quantum eigensolvers.
ill-condition, such as
- nonlinear metamaterials.
noisiness/stochasticity, such as
sometimes even discountinuity, such as
- [Li et al., 2023, NeurIPS].

For black-box problems, the only information accessible to the algorithm is function evaluations, which can be freely selected by the algorithm, leading to Zeroth-Order Optimization (ZOO) or Derivative-Free Optimization (DFO) or Gradient-Free Optimization (GFO).

No Free Lunch Theorems (NFL)

Note

“In practice it has proven to be crucial to find the right domain-specific trade-off on issues such as convergence speed, expected quality of the solutions found and sensitivity to local suboptima on the fitness landscape.”—[Wierstra et al., 2008]

As mathematically proved in [Wolpert&Macready, 1997, TEVC], “for any algorithm, any elevated performance over one class of problems is offset by performance over another class.”

This may in part explain why there exist a large number of optimization algorithms from different research communities in practice. However, unfortunately not all optimizers are well-designed and widely-acceptable. Refer to the Design Philosophy section for discussions.

Curse of Dimensionality for Large-Scale BBO (LBO)

Arguably all black-box optimizers have a possible risk of sufferring from the notorious “Curse of Dimensionality” (also called Combinatorial Explosion in the combinatorial optimization scenario), since the essence (driving force) of all black-box optimizers are based on limited sampling in practice. Please refer to e.g., [Nesterov&Spokoiny, 2017, FoCM] for theoretical analyses.

Luckily, for some real-world applications, there may exist some structures to be available. If such a structure can be efficiently exploited in an automatic fashion (via well-designed optimization strategies), the convergence rate may be significantly improved, if possible. Therefore, any general-purpose black-box optimizer may still need to keep a subtle balance between exploiting concrete problem structures and exploring the entire design space of the optimizer.

General-Purpose Optimization Algorithms

Note

“Given the abundance of black-box optimization algorithms and of optimization problems, how can best match algorithms to problems.”—[Wolpert&Macready, 1997, TEVC]

“Clearly, evaluating and comparing algorithms on a single problem is not sufficient to determine their quality, as much of their benefit lies in their performance generalizing to large classes of problems. One of the goals of research in optimization is, arguably, to provide practitioners with reliable, powerful and general-purpose algorithms.” As a library for BBO, a natural choice is to first prefer and cover general-purpose optimization algorithms (when compared with highly-customized versions), since for most real-world black-box optimization problems the (possibly useful) problem structure is typically unknown in advance.

The following common criteria/principles may be highly expected to satisfy for general-purpose optimization algorithms:

effectiveness and efficiency,
elegance (beauty),
flexibility (versatility),
robustness (reliability),
scalability,
simplicity.

Arguably, the beauty of general-purpose black-box optimizers should come from theoretical depth and/or practical breadth, though the aesthetic judgment is somewhat subjective. We believe that well-designed optimizers could pass Test-of-Time in the history of black-box optimization. For recent critical discussions, refer to e.g. “metaphor-based metaheuristics, a call for action: the elephant in the room” and “a critical problem in benchmarking and analysis of evolutionary computation methods”.

For benchmarking of continuous optimizers, refer to e.g. [Hillstrom, 1977, ACM-TOMS], [More et al., 1981, ACM-TOMS], [Hansen et al., 2021, OMS], [Meunier et al., 2022, TEVC]. As stated in [More et al., 1981, ACM-TOMS], “not testing the algorithm on a large number of functions can easily lead to the cynical observer to conclude that the algorithm was tuned to particular functions”.

POPulation-based OPtimization (POP)

Note

“The essence of an evolutionary approach to solve a problem is to equate possible solutions to individuals in a population, and to introduce a notion of fitness on the basis of solution quality.”—[Eiben&Smith, 2015, Nature]

Population-based (particularly evolutionary) optimizers (POP) usually have the following advantages for black-box problems, when particularly compared to individual-based counterparts:

few a priori assumptions (e.g. with a limited knowledge bias),
flexible framework (easy integration with problem-specific knowledge via e.g. memetic algorithms),
robust performance (e.g. w.r.t. noisy perturbation or hyper-parameters),
diverse solutions (e.g. for multi-modal/multi-objective/dynamic optimization),
novelty (e.g. beyond intuitions for design problems).

For details (models, algorithms, theories, and applications) about POP, please refer to e.g. the following well-written reviews or books (just to name a few):

Miikkulainen, R. and Forrest, S., 2021. A biological perspective on evolutionary computation. Nature Machine Intelligence, 3(1), pp.9-15.
Schoenauer, M., 2015. Chapter 28: Evolutionary algorithms. Handbook of Evolutionary Thinking in the Sciences. Springer.
Eiben, A.E. and Smith, J., 2015. From evolutionary computation to the evolution of things. Nature, 521(7553), pp.476-482.
De Jong, K.A., Fogel, D.B. and Schwefel, H.P., 1997. A history of evolutionary computation. Handbook of Evolutionary Computation. Oxford University Press.
Forrest, S., 1993. Genetic algorithms: Principles of natural selection applied to computation. Science, 261(5123), pp.872-878.

For principled design of continuous stochastic search, refer to e.g. [Nikolaus&Auger, 2014]; [Wierstra et al., 2014].

For each algorithm family, we also provide some of wide-recognized references on its own API documentations. You can also see this GitHub website for a (still growing) paper list of Evolutionary Computation (EC) published in many top-tier and also EC-focused journals and conferences.

Limitations of BBO

Note

“If you can obtain clean derivatives (even if it requires considerable effort) and the functions defining your problem are smooth and free of noise you should not use derivative-free methods..”—[Conn et al., 2009, Introduction to Derivative-Free Optimization]

Very importantly, not all optimization problems can fit well in black-box optimizers. In fact, its arbitrary abuse in science and engineering has resulted in wide criticism. Although not always, black-box optimizers are often seen as “the last choice of search methods”.

Of course, “first-order methods that require knowledge of the gradient are not always possible in practice.” ([Mhanna&Assaad, 2023, ICML])