Optimization Algorithms

class sherpa.algorithms.Algorithm[source]

Abstract algorithm that generates new set of parameters.

get_suggestion(parameters, results, lower_is_better)[source]

Returns a suggestion for parameter values.

  • parameters (list[sherpa.Parameter]) – the parameters.
  • results (pandas.DataFrame) – all results so far.
  • lower_is_better (bool) – whether lower objective values are better.

parameter values.

Return type:



Reinstantiates the algorithm when loaded.

Parameters:num_trials (int) – number of trials in study so far.
class sherpa.algorithms.RandomSearch(max_num_trials=None)[source]

Random Search with a repeat option.

Trials parameter configurations are uniformly sampled from their parameter ranges. The repeat option allows to re-run a trial repeat number of times. By default this is 1.

  • max_num_trials (int) – number of trials, otherwise runs indefinitely.
  • repeat (int) – number of times to repeat a parameter configuration.
class sherpa.algorithms.GridSearch(num_grid_points=2)[source]

Explores a grid across the hyperparameter space such that every pairing is evaluated.

For continuous and discrete parameters grid points are picked within the range. For example, a continuous parameter with range [1, 2] with two grid points would have points 1 1/3 and 1 2/3. For three points, 1 1/4, 1 1/2, and 1 3/4.


hp_space = {'act': ['tanh', 'relu'],
            'lrinit': [0.1, 0.01],
parameters = sherpa.Parameter.grid(hp_space)
alg = sherpa.algorithms.GridSearch()
Parameters:num_grid_points (int) – number of grid points for continuous / discrete.
class sherpa.algorithms.GPyOpt(model_type='GP', num_initial_data_points='infer', initial_data_points=[], acquisition_type='EI', max_concurrent=4, verbosity=False, max_num_trials=None)[source]

Sherpa wrapper around the GPyOpt package (

  • model_type (str) – The model used: - ‘GP’, standard Gaussian process. - ‘GP_MCMC’, Gaussian process with prior in the hyper-parameters. - ‘sparseGP’, sparse Gaussian process. - ‘warperdGP’, warped Gaussian process. - ‘InputWarpedGP’, input warped Gaussian process - ‘RF’, random forest (scikit-learn).
  • num_initial_data_points (int) – Number of data points to collect before fitting model. Needs to be greater/equal to the number of hyper- parameters that are being optimized. Using default ‘infer’ corres- ponds to number of hyperparameters + 1 or 0 if results are not empty.
  • initial_data_points (list[dict] or pandas.Dataframe) – Specifies initial data points. If len(initial_data_points)<num_initial_data_points then the rest is randomly sampled. Use this option to provide hyperparameter configurations that are known to be good.
  • acquisition_type (str) – Type of acquisition function to use. - ‘EI’, expected improvement. - ‘EI_MCMC’, integrated expected improvement (requires GP_MCMC model). - ‘MPI’, maximum probability of improvement. - ‘MPI_MCMC’, maximum probability of improvement (requires GP_MCMC model). - ‘LCB’, GP-Lower confidence bound. - ‘LCB_MCMC’, integrated GP-Lower confidence bound (requires GP_MCMC model).
  • max_concurrent (int) – The number of concurrent trials. This generates a batch of max_concurrent trials from GPyOpt to evaluate. If a new observation becomes available, the model is re-evaluated and a new batch is created regardless of whether the previous batch was used up. The used method is local penalization.
  • verbosity (bool) – Print models and other options during the optimization.
  • max_num_trials (int) – maximum number of trials to run for.
class sherpa.algorithms.SuccessiveHalving(r=1, R=9, eta=3, s=0, max_finished_configs=50)[source]

Asynchronous Successive Halving as described in:

@article{li2018massively, title={Massively parallel hyperparameter tuning}, author={Li, Liam and Jamieson, Kevin and Rostamizadeh, Afshin and Gonina, Ekaterina and Hardt, Moritz and Recht, Benjamin and Talwalkar, Ameet}, journal={arXiv preprint arXiv:1810.05934}, year={2018} }

Asynchronous successive halving operates based on the multi-armed bandit algorithm Successive Halving (SHA) and performs principled early stopping for random search.

  • r (int) – minimum resource that each configuration will be trained for.
  • R (int) – maximum resource.
  • eta (int) – elimination rate.
  • s (int) – minimum early-stopping rate.
  • max_finished_configs (int) – stop once max_finished_configs models have been trained to completion.
class sherpa.algorithms.LocalSearch(seed_configuration, perturbation_factors=(0.8, 1.2), repeat_trials=1)[source]

Local Search Algorithm.

This algorithm expects to start with a very good hyperparameter configuration. It changes one hyperparameter at a time to see if better results can be obtained.

  • seed_configuration (dict) – hyperparameter configuration to start with.
  • perturbation_factors (Union[tuple,list]) – continuous parameters will be multiplied by these.
  • repeat_trials (int) – number of times that identical configurations are repeated to test for random fluctuations.
class sherpa.algorithms.PopulationBasedTraining(num_generations, population_size=20, parameter_range={}, perturbation_factors=(0.8, 1.2))[source]

Population based training (PBT) as introduced by Jaderberg et al. 2017.

PBT trains a generation of population_size seed trials (randomly initialized) for a user specified number of iterations e.g. one epoch. The top 80% then move on unchanged into the second generation. The bottom 20% are re-sampled from the top 20% and perturbed. The second generation again trains for the same number of iterations and the same procedure is repeated to move into the third generation etc.

  • num_generations (int) – the number of generations to run for.
  • population_size (int) – the number of randomly intialized trials at the beginning and number of concurrent trials after that.
  • parameter_range (dict[Union[list,tuple]) – upper and lower bounds beyond which parameters cannot be perturbed.
  • perturbation_factors (tuple[float]) – the factors by which continuous parameters are multiplied upon perturbation; one is sampled randomly at a time.
class sherpa.algorithms.Repeat(algorithm, num_times=5, wait_for_completion=False, agg=False)[source]

Takes another algorithm and repeats every hyperparameter configuration a given number of times. The wrapped algorithm will be passed the mean objective values of the repeated experiments.

  • algorithm (sherpa.algorithms.Algorithm) – the algorithm to produce hyperparameter configurations.
  • num_times (int) – the number of times to repeat each configuration.
  • wait_for_completion (bool) – only relevant when running in parallel with max_concurrent > 1. Means that the algorithm won’t issue the next suggestion until all repetitions are completed. This can be useful when the repeats have impact on sequential decision making in the wrapped algorithm.
  • agg (bool) – whether to aggregate repetitions before passing them to the parameter generating algorithm.
class sherpa.algorithms.Iterate(hp_iter)[source]

Iterate over a set of fully-specified hyperparameter combinations.

Parameters:hp_iter (list) – list of fully-specified hyperparameter dicts.

Stopping Rules

class sherpa.algorithms.MedianStoppingRule(min_iterations=0, min_trials=1)[source]

Median Stopping-Rule similar to Golovin et al. “Google Vizier: A Service for Black-Box Optimization”.

  • For a Trial t, the best objective value is found.
  • Then the best objective value for every other trial is found.
  • Finally, the best-objective for the trial is compared to the median of the best-objectives of all other trials.

If trial t’s best objective is worse than that median, it is stopped.

If t has not reached the minimum iterations or there are not yet the requested number of comparison trials, t is not stopped. If t is all nan’s it is stopped by default.

  • min_iterations (int) – the minimum number of iterations a trial runs for before it is considered for stopping.
  • min_trials (int) – the minimum number of comparison trials needed for a trial to be stopped.
should_trial_stop(trial, results, lower_is_better)[source]
  • trial (sherpa.Trial) – trial to be stopped.
  • results (pandas.DataFrame) – all results so far.
  • lower_is_better (bool) – whether lower objective values are better.


Return type: