Development¶
How to contribute¶
The easiest way to contribute to SHERPA is to implement new algorithms or new schedulers.
Unit Testing¶
Unit tests are organized in scripts under /tests/
from the SHERPA
root: test_sherpa.py
tests core features of SHERPA, test_algorithms.py
tests implemented algorithms, and test_schedulers.py
tests schedulers. The
file long_tests.py
does high level testing of SHERPA and takes longer to run.
All testing makes use of pytest
, especially pytest.fixtures
. The mock
module is also used.
SHERPA Code Structure¶
Study and Trials¶
In Sherpa a parameter configuration corresponds to a Trial
object and a
parameter optimization corresponds to a Study
object. A trial has an ID
attribute and a dict
of parameter name-value pairs.
-
class
sherpa.core.
Trial
(id, parameters)[source] Represents one parameter-configuration here referred to as one trial.
Parameters: - id (int) – the Trial ID.
- parameters (dict) – parameter-name, parameter-value pairs.
A study comprises the results of a number of trials. It also provides methods
for adding a new observation for a trial to the study (add_observation
),
finalizing a trial (finalize
), getting a new trial (get_suggestion
),
and deciding whether a trial is performing worse than other trials and
should be stopped (should_trial_stop
).
-
class
sherpa.core.
Study
(parameters, algorithm, lower_is_better, stopping_rule=None, dashboard_port=None, disable_dashboard=False, output_dir=None)[source] The core of an optimization.
Includes functionality to get new suggested trials and add observations for those. Used internally but can also be used directly by the user.
Parameters: - parameters (list[sherpa.core.Parameter]) – a list of parameter ranges.
- algorithm (sherpa.algorithms.Algorithm) – the optimization algorithm.
- lower_is_better (bool) – whether to minimize or maximize the objective.
- stopping_rule (sherpa.algorithms.StoppingRule) – algorithm to stop badly performing trials.
- dashboard_port (int) – the port for the dashboard web-server, if
None
the first free port in the range 8880 to 9999 is found and used. - disable_dashboard (bool) – option to not run the dashboard.
- output_dir (str) – directory path for CSV results.
- random_seed (int) – seed to use for NumPy random number generators throughout.
In order to propose new trials or decide whether a trial should stop, the
study holds an sherpa.algorithms.Algorithm
instance that yields new trials
and a sherpa.algorithms.StoppingRule
that yields decisions about
performance. When using Sherpa in API-mode the user directly interacts with the study.
Runner¶
The _Runner
class automates the process of interacting with the study. It
consists of a loop that updates results, updates currently running jobs,
stops trials if necessary and submits new trials if necessary. In order to
achieve this it interacts with a sherpa.database._Database
object and a
sherpa.schedulers.Scheduler
object.
-
class
sherpa.core.
_Runner
(study, scheduler, database, max_concurrent, command, resubmit_failed_trials=False)[source] Encapsulates all functionality needed to run a Study in parallel.
Responsibilities:
- Get rows from database and check if any new observations need to be added
- to
Study
.
- Update active trials, finalize any completed/stopped/failed trials.
- Check what trials should be stopped and call scheduler
kill_job
- method.
- Check what trials should be stopped and call scheduler
- Check if new trials need to be submitted, get parameters and submit as a
- job.
Parameters: - study (sherpa.core.Study) – the study that is run.
- scheduler (sherpa.schedulers.Scheduler) – a scheduler object.
- database (sherpa.database._Database) – the database.
- max_concurrent (int) – how many trials to run in parallel.
- command (list[str]) – components of the command that runs a trial script e.g. [“python”, “train_nn.py”].
- resubmit_failed_trials (bool) – whether a failed trial should be resubmitted.
Putting it all together¶
The user does not directly interact with the _Runner
class. Instead it is
wrapped by the function sherpa.optimize
that sets up the database and takes
algorithm and scheduler as arguments from the user.