Development

How to contribute

The easiest way to contribute to SHERPA is to implement new algorithms or new schedulers.

Style Guide

SHERPA uses Google style Python doc-strings (e.g. here ).

Unit Testing

Unit tests are organized in scripts under /tests/ from the SHERPA root: test_sherpa.py tests core features of SHERPA, test_algorithms.py tests implemented algorithms, and test_schedulers.py tests schedulers. The file long_tests.py does high level testing of SHERPA and takes longer to run. All testing makes use of pytest, especially pytest.fixtures. The mock module is also used.

SHERPA Code Structure

Study and Trials

In Sherpa a parameter configuration corresponds to a Trial object and a parameter optimization corresponds to a Study object. A trial has an ID attribute and a dict of parameter name-value pairs.

class sherpa.core.Trial(id, parameters)[source]

Represents one parameter-configuration here referred to as one trial.

Parameters:
  • id (int) – the Trial ID.
  • parameters (dict) – parameter-name, parameter-value pairs.

A study comprises the results of a number of trials. It also provides methods for adding a new observation for a trial to the study (add_observation), finalizing a trial (finalize), getting a new trial (get_suggestion), and deciding whether a trial is performing worse than other trials and should be stopped (should_trial_stop).

class sherpa.core.Study(parameters, algorithm, lower_is_better, stopping_rule=None, dashboard_port=None, disable_dashboard=False, output_dir=None)[source]

The core of an optimization.

Includes functionality to get new suggested trials and add observations for those. Used internally but can also be used directly by the user.

Parameters:
  • parameters (list[sherpa.core.Parameter]) – a list of parameter ranges.
  • algorithm (sherpa.algorithms.Algorithm) – the optimization algorithm.
  • lower_is_better (bool) – whether to minimize or maximize the objective.
  • stopping_rule (sherpa.algorithms.StoppingRule) – algorithm to stop badly performing trials.
  • dashboard_port (int) – the port for the dashboard web-server, if None the first free port in the range 8880 to 9999 is found and used.
  • disable_dashboard (bool) – option to not run the dashboard.
  • output_dir (str) – directory path for CSV results.
  • random_seed (int) – seed to use for NumPy random number generators throughout.

In order to propose new trials or decide whether a trial should stop, the study holds an sherpa.algorithms.Algorithm instance that yields new trials and a sherpa.algorithms.StoppingRule that yields decisions about performance. When using Sherpa in API-mode the user directly interacts with the study.

Runner

The _Runner class automates the process of interacting with the study. It consists of a loop that updates results, updates currently running jobs, stops trials if necessary and submits new trials if necessary. In order to achieve this it interacts with a sherpa.database._Database object and a sherpa.schedulers.Scheduler object.

class sherpa.core._Runner(study, scheduler, database, max_concurrent, command, resubmit_failed_trials=False)[source]

Encapsulates all functionality needed to run a Study in parallel.

Responsibilities:

  • Get rows from database and check if any new observations need to be added
    to Study.
  • Update active trials, finalize any completed/stopped/failed trials.
  • Check what trials should be stopped and call scheduler kill_job
    method.
  • Check if new trials need to be submitted, get parameters and submit as a
    job.
Parameters:
  • study (sherpa.core.Study) – the study that is run.
  • scheduler (sherpa.schedulers.Scheduler) – a scheduler object.
  • database (sherpa.database._Database) – the database.
  • max_concurrent (int) – how many trials to run in parallel.
  • command (list[str]) – components of the command that runs a trial script e.g. [“python”, “train_nn.py”].
  • resubmit_failed_trials (bool) – whether a failed trial should be resubmitted.

Putting it all together

The user does not directly interact with the _Runner class. Instead it is wrapped by the function sherpa.optimize that sets up the database and takes algorithm and scheduler as arguments from the user.