A Guide to SHERPA¶
Parameters¶
Hyperparameters are defined via sherpa.Parameter
objects. Available are
sherpa.Continuous
: Represents continuous parameters such as weight-decay multiplier. Can also be thought of as float.sherpa.Discrete
: Represents discrete parameters such as number of hidden units in a neural network. Can also be thought of as int.sherpa.Ordinal
: Represents categorical ordered parameters. For example minibatch size could be an ordinal parameter taking values 8, 16, 32, etc. Can also be thought of as list.sherpa.Choice
: Represents unordered categorical parameters such as activation function in a neural network. Can also be thought of as a set.
Every parameter takes a name
and range
argument. The name
argument
is simply the name of the hyperparameter. The range
is either the lower and
upper bound of the range, or the possible values in the case of
sherpa.Ordinal
and sherpa.Choice
.
The sherpa.Continuous
and sherpa.Discrete
parameters also take a scale
argument which can take values linear
or log
. This describes whether
values are sampled uniformly on a linear or a log scale.
Hyperparameters are defined as a list to be passed to the sherpa.Study
down
the line. For example:
parameters = [sherpa.Continuous(name='lr', range=[0.005, 0.1], scale='log'),
sherpa.Continuous(name='dropout', range=[0., 0.4]),
sherpa.Ordinal(name='batch_size', range=[16, 32, 64]),
sherpa.Discrete(name='num_hidden_units', range=[100, 300]),
sherpa.Choice(name='activation', range=['relu', 'elu', 'prelu'])]
Note that it is generally recommended not to represent continuous or discrete parameters as categorical. This is due to the fact that exploring a range of values rather than discrete options yields much more information to understand the relationship between the hyperparameter and the outcome.
The Algorithm¶
The algorithm refers to the procedure that determines how hyperparameter
configurations are chosen and in some cases the resource they are assigned.
All available algorithms can be found in sherpa.algorithms
. The description
in Available Algorithms gives an in-depth view
of what algorithms are available, their arguments, and when one might chose one algorithm over
another. The sherpa.algorithms
module is also home to stopping rules.
Those are procedures that define if a trial should be stopped before its
completion.
The initialization of the algorithm is simple. For example:
algorithm = sherpa.algorithms.RandomSearch(max_num_trials=150)
where max_num_trials
stands for the number of trials after which the
algorithm will finish.
The Study¶
In Sherpa a Study represents the hyperparameter optimization itself. It holds references to the parameter ranges, the algorithm, the results that have been gathered, and provides an interface to obtain a new trial, or add results from previously suggested trial. It also starts the dashboard in the background. When initializing the study it expects references to the parameter ranges, the algorithm, and at minimum a boolean variable on whether lower objective values are better. For a full list of the arguments see the Study-API reference.
study = sherpa.Study(parameters=parameters,
algorithm=algorithm,
lower_is_better=False)
In order to obtain a first trial one can either call Study.get_suggestion()
or directly iterate over the Study
object.
# To get a single trial
trial = study.get_suggestion()
# Or directly iterate over the study
for trial in study:
...
The Trial
object has an id
attribute and a parameters
attribute.
The latter contains a dictionary with a hyperparameter configuration from the
previously specified ranges provides by the defined algorithm. The parameter
configuration can be used to initialize, train, and evaluate a model.
model = init_model(train.parameters)
During training Study.add_observation
can be used to add intermediate metric
values from the model training.
for iteration in range(num_iterations):
training_error = model.fit(epochs=1)
validation_error = model.evaluate()
study.add_observation(trial=trial,
iteration=iteration,
objective=validation_error,
context={'training_error': training_error})
Once the model has completed training Sherpa expects a call to the
Study.finalize
function.
study.finalize(trial)
This can be put together in a double for-loop of the form:
for trial in study:
model = init_model(trial.parameters)
for iteration in range(num_iterations):
training_error = model.fit(epochs=1)
validation_error = model.evaluate()
study.add_observation(trial=trial,
iteration=iteration,
objective=validation_error,
context={'training_error': training_error})
study.finalize(trial)
Visualization¶
Once the Study
object is initialized it will output the following:
SHERPA Dashboard running on http://...
Following that link brings up the dashboard. The figure at the top of the dashboard is a parallel coordinates plot. It allows the user to brush over axes and thereby restrict ranges of the trials she wants to see. This is useful to find what objective values correspond to hyperparameters of a certain range. Similarly, one can brush over the objective value axis to find the best performing configurations. The table in the bottom left of the dashboard is linked to the plot. Therefore, it is easy to see what exact hyperparameters the filtered trials correspond to. One can also sort the table by any of its columns. Lastly, on the bottom right is a line plot that shows the progression of objective values for each trial. This is useful in analyzing how and if the training converges. Below is a screenshot of the dashboard towards the end of a study.