Asynchronous Successive Halving (ASHA)

Successive halving is an algorithm based on the multi-armed bandit methodology. The ASHA algorithm is a way to combine random search with principled early stopping in an asynchronous way. We highly recommend this blog post by the authors of this method: https://blog.ml.cmu.edu/2018/12/12/massively-parallel-hyperparameter-optimization/ .

[ ]:
import sherpa
import sherpa.algorithms.bayesian_optimization as bayesian_optimization
import keras
from keras.models import Sequential, load_model
from keras.layers import Dense, Flatten
from keras.datasets import mnist
from keras.optimizers import Adam
import tempfile
import os
import shutil

Dataset Preparation

[2]:
(x_train, y_train), (x_test, y_test) = mnist.load_data()
x_train, x_test = x_train/255.0, x_test/255.0

Sherpa Setup

In this example we use \(R=9\) and \(\eta=3\). That means to obtain one finished configuration we will train 9 configurations for 1 epochs, pick 3 configurations of those and train for 3 more epochs, then pick one out of those and train for another 9 epochs. You can increase the max_finished_configs argument to do a larger search.

[20]:
parameters = [sherpa.Continuous('learning_rate', [1e-4, 1e-2], 'log'),
              sherpa.Discrete('num_units', [32, 128]),
              sherpa.Choice('activation', ['relu', 'tanh', 'sigmoid'])]
algorithm = alg = sherpa.algorithms.SuccessiveHalving(r=1, R=9, eta=3, s=0, max_finished_configs=1)
study = sherpa.Study(parameters=parameters,
                     algorithm=algorithm,
                     lower_is_better=False,
                     dashboard_port=8995)
INFO:sherpa.core:
-------------------------------------------------------
SHERPA Dashboard running. Access via
http://128.195.75.106:8995 if on a cluster or
http://localhost:8995 if running locally.
-------------------------------------------------------

Make a temporary directory to store model files in. Successive Halving tries hyperparameter configurations for bigger and bigger budgets (training epochs). Therefore, intermediate models have to be saved.

[21]:
model_dir = tempfile.mkdtemp()

Hyperparameter Optimization

Note: we manually infer the number of epochs that the model has trained for so we can give this information to Keras.

[22]:
for trial in study:
    # Getting number of training epochs
    initial_epoch = {1: 0, 3: 1, 9: 4}[trial.parameters['resource']]
    epochs = trial.parameters['resource'] + initial_epoch

    print("-"*100)
    print(f"Trial:\t{trial.id}\nEpochs:\t{initial_epoch} to {epochs}\nParameters:{trial.parameters}\n")

    if trial.parameters['load_from'] == "":
        print(f"Creating new model for trial {trial.id}...\n")

        # Get hyperparameters
        lr = trial.parameters['learning_rate']
        num_units = trial.parameters['num_units']
        act = trial.parameters['activation']

        # Create model
        model = Sequential([Flatten(input_shape=(28, 28)),
                            Dense(num_units, activation=act),
                            Dense(10, activation='softmax')])
        optimizer = Adam(lr=lr)
        model.compile(loss='sparse_categorical_crossentropy',
                      optimizer=optimizer,
                      metrics=['accuracy'])
    else:
        print(f"Loading model from: ", os.path.join(model_dir, trial.parameters['load_from']), "...\n")

        # Loading model
        model = load_model(os.path.join(model_dir, trial.parameters['load_from']))


    # Train model
    for i in range(initial_epoch, epochs):
        model.fit(x_train, y_train, initial_epoch=i, epochs=i+1)
        loss, accuracy = model.evaluate(x_test, y_test)

        print("Validation accuracy: ", accuracy)
        study.add_observation(trial=trial, iteration=i,
                              objective=accuracy,
                              context={'loss': loss})

    study.finalize(trial=trial)
    print(f"Saving model at: ", os.path.join(model_dir, trial.parameters['save_to']))
    model.save(os.path.join(model_dir, trial.parameters['save_to']))

    study.save(model_dir)
----------------------------------------------------------------------------------------------------
Trial:  1
Epochs: 0 to 1
Parameters:{'learning_rate': 0.0006779922111149317, 'num_units': 67, 'activation': 'tanh', 'resource': 1, 'rung': 0, 'load_from': '', 'save_to': '1'}
Creating new model for trial 1...

Epoch 1/1
60000/60000 [==============================] - 4s 72us/step - loss: 0.3451 - acc: 0.9059
10000/10000 [==============================] - 1s 53us/step
Validation accuracy:  0.9426
Saving model at:  /var/folders/5v/l788ch2j7tg0q0y1rt04c08w0000gn/T/tmpa7vbw5xz/1
----------------------------------------------------------------------------------------------------
Trial:  2
Epochs: 0 to 1
Parameters:{'learning_rate': 0.0007322493943507595, 'num_units': 53, 'activation': 'sigmoid', 'resource': 1, 'rung': 0, 'load_from': '', 'save_to': '2'}
Creating new model for trial 2...

Epoch 1/1
60000/60000 [==============================] - 4s 71us/step - loss: 0.5720 - acc: 0.8661
10000/10000 [==============================] - 0s 47us/step
Validation accuracy:  0.9213
Saving model at:  /var/folders/5v/l788ch2j7tg0q0y1rt04c08w0000gn/T/tmpa7vbw5xz/2
----------------------------------------------------------------------------------------------------
Trial:  3
Epochs: 0 to 1
Parameters:{'learning_rate': 0.00013292608500661002, 'num_units': 115, 'activation': 'tanh', 'resource': 1, 'rung': 0, 'load_from': '', 'save_to': '3'}
Creating new model for trial 3...

Epoch 1/1
60000/60000 [==============================] - 5s 80us/step - loss: 0.5496 - acc: 0.8571
10000/10000 [==============================] - 0s 48us/step
Validation accuracy:  0.9112
Saving model at:  /var/folders/5v/l788ch2j7tg0q0y1rt04c08w0000gn/T/tmpa7vbw5xz/3
----------------------------------------------------------------------------------------------------
Trial:  4
Epochs: 1 to 4
Parameters:{'learning_rate': 0.0006779922111149317, 'num_units': 67, 'activation': 'tanh', 'save_to': '4', 'resource': 3, 'rung': 1, 'load_from': '1'}
Loading model from:  /var/folders/5v/l788ch2j7tg0q0y1rt04c08w0000gn/T/tmpa7vbw5xz/1 ...

Epoch 2/2
60000/60000 [==============================] - 3s 50us/step - loss: 0.1818 - acc: 0.9473
10000/10000 [==============================] - 0s 46us/step
Validation accuracy:  0.9559
Epoch 3/3
60000/60000 [==============================] - 3s 51us/step - loss: 0.1353 - acc: 0.9617
10000/10000 [==============================] - 0s 39us/step
Validation accuracy:  0.9629
Epoch 4/4
60000/60000 [==============================] - 3s 52us/step - loss: 0.1074 - acc: 0.9687
10000/10000 [==============================] - 0s 22us/step
Validation accuracy:  0.9659
Saving model at:  /var/folders/5v/l788ch2j7tg0q0y1rt04c08w0000gn/T/tmpa7vbw5xz/4
----------------------------------------------------------------------------------------------------
Trial:  5
Epochs: 0 to 1
Parameters:{'learning_rate': 0.0003139094199248622, 'num_units': 88, 'activation': 'sigmoid', 'resource': 1, 'rung': 0, 'load_from': '', 'save_to': '5'}
Creating new model for trial 5...

Epoch 1/1
60000/60000 [==============================] - 4s 68us/step - loss: 0.7136 - acc: 0.8431
10000/10000 [==============================] - 0s 49us/step
Validation accuracy:  0.9098
Saving model at:  /var/folders/5v/l788ch2j7tg0q0y1rt04c08w0000gn/T/tmpa7vbw5xz/5
----------------------------------------------------------------------------------------------------
Trial:  6
Epochs: 0 to 1
Parameters:{'learning_rate': 0.0008001577665974275, 'num_units': 36, 'activation': 'sigmoid', 'resource': 1, 'rung': 0, 'load_from': '', 'save_to': '6'}
Creating new model for trial 6...

Epoch 1/1
60000/60000 [==============================] - 4s 59us/step - loss: 0.6274 - acc: 0.8588
10000/10000 [==============================] - 0s 48us/step
Validation accuracy:  0.9169
Saving model at:  /var/folders/5v/l788ch2j7tg0q0y1rt04c08w0000gn/T/tmpa7vbw5xz/6
----------------------------------------------------------------------------------------------------
Trial:  7
Epochs: 0 to 1
Parameters:{'learning_rate': 0.003299640159323735, 'num_units': 63, 'activation': 'tanh', 'resource': 1, 'rung': 0, 'load_from': '', 'save_to': '7'}
Creating new model for trial 7...

Epoch 1/1
60000/60000 [==============================] - 4s 68us/step - loss: 0.2387 - acc: 0.9294
10000/10000 [==============================] - 1s 52us/step
Validation accuracy:  0.9521
Saving model at:  /var/folders/5v/l788ch2j7tg0q0y1rt04c08w0000gn/T/tmpa7vbw5xz/7
----------------------------------------------------------------------------------------------------
Trial:  8
Epochs: 1 to 4
Parameters:{'learning_rate': 0.003299640159323735, 'num_units': 63, 'activation': 'tanh', 'save_to': '8', 'resource': 3, 'rung': 1, 'load_from': '7'}
Loading model from:  /var/folders/5v/l788ch2j7tg0q0y1rt04c08w0000gn/T/tmpa7vbw5xz/7 ...

Epoch 2/2
60000/60000 [==============================] - 3s 52us/step - loss: 0.1209 - acc: 0.9641
10000/10000 [==============================] - 1s 52us/step
Validation accuracy:  0.961
Epoch 3/3
60000/60000 [==============================] - 3s 53us/step - loss: 0.0953 - acc: 0.9704
10000/10000 [==============================] - 0s 24us/step
Validation accuracy:  0.9667
Epoch 4/4
60000/60000 [==============================] - 3s 52us/step - loss: 0.0800 - acc: 0.9756
10000/10000 [==============================] - 0s 23us/step
Validation accuracy:  0.9679
Saving model at:  /var/folders/5v/l788ch2j7tg0q0y1rt04c08w0000gn/T/tmpa7vbw5xz/8
----------------------------------------------------------------------------------------------------
Trial:  9
Epochs: 0 to 1
Parameters:{'learning_rate': 0.0025750610635902832, 'num_units': 48, 'activation': 'sigmoid', 'resource': 1, 'rung': 0, 'load_from': '', 'save_to': '9'}
Creating new model for trial 9...

Epoch 1/1
60000/60000 [==============================] - 4s 62us/step - loss: 0.3477 - acc: 0.9065
10000/10000 [==============================] - 1s 54us/step
Validation accuracy:  0.9421
Saving model at:  /var/folders/5v/l788ch2j7tg0q0y1rt04c08w0000gn/T/tmpa7vbw5xz/9
----------------------------------------------------------------------------------------------------
Trial:  10
Epochs: 0 to 1
Parameters:{'learning_rate': 0.0025240507488864423, 'num_units': 124, 'activation': 'tanh', 'resource': 1, 'rung': 0, 'load_from': '', 'save_to': '10'}
Creating new model for trial 10...

Epoch 1/1
60000/60000 [==============================] - 5s 85us/step - loss: 0.2297 - acc: 0.9303
10000/10000 [==============================] - 1s 58us/step
Validation accuracy:  0.9644
Saving model at:  /var/folders/5v/l788ch2j7tg0q0y1rt04c08w0000gn/T/tmpa7vbw5xz/10
----------------------------------------------------------------------------------------------------
Trial:  11
Epochs: 1 to 4
Parameters:{'learning_rate': 0.0025240507488864423, 'num_units': 124, 'activation': 'tanh', 'save_to': '11', 'resource': 3, 'rung': 1, 'load_from': '10'}
Loading model from:  /var/folders/5v/l788ch2j7tg0q0y1rt04c08w0000gn/T/tmpa7vbw5xz/10 ...

Epoch 2/2
60000/60000 [==============================] - 5s 78us/step - loss: 0.1079 - acc: 0.9670
10000/10000 [==============================] - 1s 63us/step
Validation accuracy:  0.971
Epoch 3/3
60000/60000 [==============================] - 5s 77us/step - loss: 0.0761 - acc: 0.9764
10000/10000 [==============================] - 0s 28us/step
Validation accuracy:  0.9731
Epoch 4/4
60000/60000 [==============================] - 4s 73us/step - loss: 0.0599 - acc: 0.9811
10000/10000 [==============================] - 0s 30us/step
Validation accuracy:  0.9692
Saving model at:  /var/folders/5v/l788ch2j7tg0q0y1rt04c08w0000gn/T/tmpa7vbw5xz/11
----------------------------------------------------------------------------------------------------
Trial:  12
Epochs: 4 to 13
Parameters:{'learning_rate': 0.0025240507488864423, 'num_units': 124, 'activation': 'tanh', 'save_to': '12', 'resource': 9, 'rung': 2, 'load_from': '11'}
Loading model from:  /var/folders/5v/l788ch2j7tg0q0y1rt04c08w0000gn/T/tmpa7vbw5xz/11 ...

Epoch 5/5
60000/60000 [==============================] - 5s 77us/step - loss: 0.0466 - acc: 0.9850
10000/10000 [==============================] - 1s 60us/step
Validation accuracy:  0.973
Epoch 6/6
60000/60000 [==============================] - 4s 74us/step - loss: 0.0416 - acc: 0.9866
10000/10000 [==============================] - 0s 27us/step
Validation accuracy:  0.9726
Epoch 7/7
60000/60000 [==============================] - 5s 75us/step - loss: 0.0354 - acc: 0.9884
10000/10000 [==============================] - 0s 27us/step
Validation accuracy:  0.9744
Epoch 8/8
60000/60000 [==============================] - 4s 72us/step - loss: 0.0292 - acc: 0.9908
10000/10000 [==============================] - 0s 28us/step
Validation accuracy:  0.9739
Epoch 9/9
60000/60000 [==============================] - 4s 73us/step - loss: 0.0286 - acc: 0.9905
10000/10000 [==============================] - 0s 27us/step
Validation accuracy:  0.974
Epoch 10/10
60000/60000 [==============================] - 4s 72us/step - loss: 0.0245 - acc: 0.9919
10000/10000 [==============================] - 0s 27us/step
Validation accuracy:  0.9725
Epoch 11/11
60000/60000 [==============================] - 4s 72us/step - loss: 0.0233 - acc: 0.9916
10000/10000 [==============================] - 0s 31us/step
Validation accuracy:  0.9714
Epoch 12/12
60000/60000 [==============================] - 4s 72us/step - loss: 0.0203 - acc: 0.9935
10000/10000 [==============================] - 0s 28us/step
Validation accuracy:  0.972
Epoch 13/13
60000/60000 [==============================] - 4s 74us/step - loss: 0.0195 - acc: 0.9934
10000/10000 [==============================] - 0s 27us/step
Validation accuracy:  0.9727
Saving model at:  /var/folders/5v/l788ch2j7tg0q0y1rt04c08w0000gn/T/tmpa7vbw5xz/12

The best found hyperparameter configuration is:

[23]:
study.get_best_result()
[23]:
{'Iteration': 6,
 'Objective': 0.9744,
 'Trial-ID': 12,
 'activation': 'tanh',
 'learning_rate': 0.0025240507488864423,
 'load_from': '11',
 'loss': 0.08811961327217287,
 'num_units': 124,
 'resource': 9,
 'rung': 2,
 'save_to': '12'}

This model is stored at:

[24]:
print(os.path.join(model_dir, study.get_best_result()['save_to']))
/var/folders/5v/l788ch2j7tg0q0y1rt04c08w0000gn/T/tmpa7vbw5xz/12

To remove the model directory:

[25]:
# Remove model_dir
shutil.rmtree(model_dir)