Development¶
How to contribute¶
The easiest way to contribute to SHERPA is to implement new algorithms or new schedulers.
Unit Testing¶
Unit tests are organized in scripts under /tests/ from the SHERPA
root: test_sherpa.py tests core features of SHERPA, test_algorithms.py
tests implemented algorithms, and test_schedulers.py tests schedulers. The
file long_tests.py does high level testing of SHERPA and takes longer to run.
All testing makes use of pytest, especially pytest.fixtures. The mock
module is also used.
SHERPA Code Structure¶
Study and Trials¶
In Sherpa a parameter configuration corresponds to a Trial object and a
parameter optimization corresponds to a Study object. A trial has an ID
attribute and a dict of parameter name-value pairs.
-
class
sherpa.core.Trial(id, parameters)[source] Represents one parameter-configuration here referred to as one trial.
Parameters: - id (int) – the Trial ID.
- parameters (dict) – parameter-name, parameter-value pairs.
A study comprises the results of a number of trials. It also provides methods
for adding a new observation for a trial to the study (add_observation),
finalizing a trial (finalize), getting a new trial (get_suggestion),
and deciding whether a trial is performing worse than other trials and
should be stopped (should_trial_stop).
-
class
sherpa.core.Study(parameters, algorithm, lower_is_better, stopping_rule=None, dashboard_port=None, disable_dashboard=False, output_dir=None)[source] The core of an optimization.
Includes functionality to get new suggested trials and add observations for those. Used internally but can also be used directly by the user.
Parameters: - parameters (list[sherpa.core.Parameter]) – a list of parameter ranges.
- algorithm (sherpa.algorithms.Algorithm) – the optimization algorithm.
- lower_is_better (bool) – whether to minimize or maximize the objective.
- stopping_rule (sherpa.algorithms.StoppingRule) – algorithm to stop badly performing trials.
- dashboard_port (int) – the port for the dashboard web-server, if
Nonethe first free port in the range 8880 to 9999 is found and used. - disable_dashboard (bool) – option to not run the dashboard.
- output_dir (str) – directory path for CSV results.
- random_seed (int) – seed to use for NumPy random number generators throughout.
In order to propose new trials or decide whether a trial should stop, the
study holds an sherpa.algorithms.Algorithm instance that yields new trials
and a sherpa.algorithms.StoppingRule that yields decisions about
performance. When using Sherpa in API-mode the user directly interacts with the study.
Runner¶
The _Runner class automates the process of interacting with the study. It
consists of a loop that updates results, updates currently running jobs,
stops trials if necessary and submits new trials if necessary. In order to
achieve this it interacts with a sherpa.database._Database object and a
sherpa.schedulers.Scheduler object.
-
class
sherpa.core._Runner(study, scheduler, database, max_concurrent, command, resubmit_failed_trials=False)[source] Encapsulates all functionality needed to run a Study in parallel.
Responsibilities:
- Get rows from database and check if any new observations need to be added
- to
Study.
- Update active trials, finalize any completed/stopped/failed trials.
- Check what trials should be stopped and call scheduler
kill_job - method.
- Check what trials should be stopped and call scheduler
- Check if new trials need to be submitted, get parameters and submit as a
- job.
Parameters: - study (sherpa.core.Study) – the study that is run.
- scheduler (sherpa.schedulers.Scheduler) – a scheduler object.
- database (sherpa.database._Database) – the database.
- max_concurrent (int) – how many trials to run in parallel.
- command (list[str]) – components of the command that runs a trial script e.g. [“python”, “train_nn.py”].
- resubmit_failed_trials (bool) – whether a failed trial should be resubmitted.
Putting it all together¶
The user does not directly interact with the _Runner class. Instead it is
wrapped by the function sherpa.optimize that sets up the database and takes
algorithm and scheduler as arguments from the user.