Bayesian Optimization

AutoML Optimization

Word count: 735Reading time: 4 min

 2020/04/11   Share

According to Wikipedia, Bayesian optimization (BO) is a sequential design strategy for global optimization of black-box functions that do not require derivatives.

Study Objective

In this post, we will:

Review optimization algorithms. Compare Bayesian optimization with gradient descent.
Understand the process of Bayesian optimization.
Write a simple Bayesian optimization algorithm from scratch.
Use BO not from scratch.

Optimization

Optimization is to find the global max or min from the function. Besides BO, we also use grid search, random search and gradient descent. The first two can be used on any function if computation power is not the problem. The third is used only if the function is convex and derivative. Bayesian optimization has no such limitations. However, the function is supposed to be smooth and continuous, otherwise, I would recommend grid search and random search.

Process

I think the whole process can be demonstrated as the following trinity:

Bayesian Optimization Trinity

Here the objective function is the black function that we optimize. It produces samples (x, y). Usually, it is very costly to run the objective function, such that we introduce a surrogate function to predict the objective function. The prediction is mean and standard deviation of the objective function, which are used by the acquisition function in searching for the next x to explore or exploit. The next x is then consumed by the objective function. The loop goes on until the max or min is found.

The surrogate function is usually the Gaussian process regressor while the acquisition function has many options. They are:

Probability of Improvement (PI).
Expected Improvement (EI).
Upper/Lower Confidence Bound (LCB/UCB).

Here we only focus on what we use, the UCB, the most straightforward one. If we ignore weight, UCB can be written as:

𝑈𝐶𝐵(𝑥)=𝜇(𝑥)+𝜎(𝑥)

𝑥=argmax 𝑈𝐶𝐵(𝑥)

We do all of this inside the search space.

The whole process:

Initiate x with the min and max of the search space.
Calculate y using x and objective function.
Fit the surrogate function.
Find new x using the acquisition function.
Go to 1

Python Code

Let’s first define and plot the objective function as following:

1 2	def objective(x): return ((x-0.47)*2 math.sin(3 * x))

Objective

Here the maximum (x=0.87, y= 0.0811051) is also plotted as a blue point.

Now, this function is non-convex, thus gradient descent cannot be used. We use BO. The surrogate function is the Gaussian process regressor and UCB is used as the acquisition function.

#uppper confidence bound (UCB)
#beta = 1
def acquisition(mean, std):
    mean=mean.flatten()

    #UCB
    upper=mean+std
    #argmax
    max_at=np.argmax(upper)
    return X[max_at]

#surrogate
surrogate = GaussianProcessRegressor()

Code for the whole process

#step 0 Initiate x with the min and max of the search space.
xsamples=np.array([[0],[1]])

#step 1 Calculate y using x and objective function.
ysamples=np.array([objective(x) for x in xsamples])

for i in range(4):
    #step 2 Fit the surrogate function.
    surrogate.fit(xsamples, ysamples)
    mean, std=surrogate.predict(X, return_std=True)
    
    #step 3 Find new x using acquisition function.
    new_x=acquisition(mean, std)
    
    #step 4 Go to 1
    new_y=objective(new_x)

    #plot
    plot(X, y, xsamples, ysamples, mean, std, new_x, new_y, i)
    xsamples=np.vstack((xsamples, new_x))
    ysamples=np.vstack((ysamples, new_y))

Step 0:

In step 0, we simply produce two sample x=0 and x=1.

Iteration 0:

In iteration 0, the two samples are used by the surrogate function to generate mean and std(green). The acquisition function finds the max UCB when x=0.53(red). Objective(0.53) is 0.00359934. Now a new point (0.53, 0.00359934) is found. We give the three points to iteration 1.

Bayesian Optimization Iteration 0

Iteration 1:

Bayesian Optimization Iteration 1

Iteration 2:

Bayesian Optimization Iteration 2

Iteration 3:

Bayesian Optimization Iteration 3

The new point(red) here is (0.53, 0.00359934) and it is just the max. We can see that BO can find the optimum in just 4 iterations. And we call the objective function only 6 times. If we use the grid search, that will be 100 times.

The complete version of the code can be found here:

https://github.com/EricWebsmith/machine_learning_from_scrach/blob/master/bayesian_optimization.ipynb

Bayesian Optimization not from scratch

There are many tools for BO. One of them is Hyperopt. The following is just a simple demonstration of that.

from hyperopt import fmin, tpe, hp
best = fmin(
    fn=lambda x:-objective(x),
    space=hp.uniform('x', 0, 1),
    algo=tpe.suggest,
    max_evals=100)
print(best)

Reference

https://en.wikipedia.org/wiki/Bayesian_optimization

Author：Eric Websmith

Url: https://ericwebsmith.github.io/2020/04/11/BayesianOptimization/

Publish Date: April 11th 2020, 10:59:48 am

Updated Date: April 12th 2020, 2:52:45 pm

License: We use creativecommons 4.0 license

Next Post

Gradient Boosting Regression from Scratch
Previous Post

AutoML Introduction

CATALOG

1. Study Objective
2. Optimization
3. Process
4. Python Code
5. Bayesian Optimization not from scratch
6. Reference



缺失模块。
1、请确保node版本大于6.2
2、在博客根目录（注意不是archer根目录）执行以下命令：
npm i hexo-generator-json-content --save
3、在根目录_config.yml里添加配置：

jsonContent:
  meta: false
  pages: false
  posts:
    title: true
    date: true
    path: true
    text: false
    raw: false
    content: false
    slug: false
    updated: false
    comments: false
    link: false
    permalink: false
    excerpt: false
    categories: true
    tags: true