Usage#
Specifying parameters#
Traits#
The configuration is specified through Section classes. Each
section contains parameters in the form of class attribute of type
traitlets.TraitType (for instance Float,
Unicode, or List), or other (nested)
sections.
Note
Traits can be confusing at first. They are a sort of descriptor. A container class has instances of traits bound as class attribute. For instance:
class Container(Section):
name = Float(default_value=1.)
From this we can access the trait instance with Container.name, but it
only contains the information used for its definition, it does not hold any
actual value.
But if we create an instance of the container, when we access name
we will obtain a value, not a trait:
>>> c = Container()
>>> type(c.name)
float
>>> c.name = 2 # we can change it
>>> c.name
2.0
It behaves nearly like a typical float attribute. When we change the value
for instance, the trait (again which is a class attribute) will be used
to validate the new value, or do some more advanced things. But the value
is tied to the container instance c.
Here are some of the basic traits types:
For strings. Traitlets differentiates unicode and bytes strings. |
|
Containers can check the
element type: |
|
To check type, Tuple must
specify every element:
|
|
Dict can specify both key and
values:
|
|
Must be one of the specified values:
|
|
Multiple types are permitted. Will try to
convert in the order they are specified. For
instance, always use this order:
|
|
|
|
This is currently unsupported. |
The packages provides two new of types traits. RangeTrait is a list of
integers or floats that can be parsed from a slice specification in the form
start:stop[:step]. ‘stop’ is inclusive. It can still take in lists of
values normally (--year 2002 2005 2006).
With
year = RangeTrait(Int()),--year=2002:2004will be parsed as[2002, 2003, 2004]With
coef = RangeTrait(Float()),--coef=0:1:0.5will be parsed as[0.0, 0.5, 1.0].
FixableTrait is meant to work with filefinder, for parameters defined in filename
patterns. It can take
a single value
a string that will be interpreted as a range of values if the trait type allows it (Int or Float)
a string that will be interpreted as a regular expression (this is disabled by default as it can be dangerous: any value from command line that cannot be parsed would still be allowed).
a list of values
Subsections#
A section can contain other sub-sections, allowing a tree-like, nested configuration. It can be done by in two ways:
Subsections can be defined directly inside another section class definition. The name of such a nested class will be used for the corresponding subsection attribute. The class definition will be renamed and moved under the attribute
_{name}SectionDef. For example:class MyConfig(Section): class log(Section): level = Unicode("INFO") class sst(Section): dataset = Enum(["a", "b"]) class a(Section): location = Unicode("/somewhere") time_resolution = Int(8, help="in days") class b(Section): location = Unicode("/somewhere/else") MyConfig().sst.a.location = "/fantastic"
A mypy plugin is provided to support these dynamic definitions. Add it to the list of plugins in your mypy configuration file, for instance in ‘pyproject.toml’:
[mypy] plugins = ['data_assistant.config.mypy_plugin']
A more standard way is by using the
Subsectionclass and setting it as an attribute in the parent section:from data_assistant.config import Subsection class ChildSection(Section): param_b = Int(1) class ParentSection(Section): param_a = Int(1) child = Subsection(ChildSection)
In the example above we have two parameters available at
param_aandchild.param_b.
Important
Like traits, Subsections are also descriptors: accessing
ParentSection().child returns a ChildSection instance.
Aliases#
It is possible to define aliases with the Section.aliases attribute.
It is a mapping of shortcut names to a deeper subsection:
{"short": "some.deeply.nested.subsection"}
Application#
The principal section, at the root of the configuration tree, is the
Application. As a subclass of
Section, it can hold directly all your parameters and nested
subsections. It will also be responsible for gathering the parameters from
configuration files and the command line, and more.
Here is a simple example:
from data_assistant.config import ApplicationBase, Section
from traitlets import Bool, Float, Int, List, Unicode
class App(ApplicationBase):
class computation(Section):
parallel = Bool(False, help="Conduct computation in parallel if true.")
n_cores = Int(1, help="Number of cores to use for computation.")
class physical(Section):
threshold = Float(2.5, help="Threshold for some computation.")
data_name = Unicode("SST")
years = List(
Int(),
default_value=[2000, 2001, 2008],
min_length=1,
help="Years to do the computation on."
)
>>> app = App()
>>> app.physical.years = [2023, 2024]
Starting the application#
By default, when the application is instantiated it executes its starting
sequence with the start() method. It will:
Parse command line arguments
Read parameters from configuration files
Instantiate all subsections with the obtained parameters
This can be controlled with __init__ arguments start, ignore_cli,
and instantiate.
Note
Even though some features are still available if the subsections are not instantiated (since the subsections classes contain information about the parameters), instantiating them is necessary to fully validate the parameters.
Logging#
The base application contains some parameters to easily log information. A
logger instance is available at ApplicationBase.log that will log to
the console (stderr), and can be configured via the (trait) parameters
log_level, log_format, and
log_datefmt.
The configuration of the logging setup is kept minimal. Users needing to
configure it further may look into ApplicationBase._get_logging_config().
Note
The logger will have the application class fullname (module + class name), so logging inheritance rules will apply.
Accessing parameters#
As explained above, the value of parameters can be accessed (or changed) just like attributes of the section that contains them. This allows for deeply nested access:
app.some.deeply.nested.trait = 2
Note
This benefits from the features of traitlets: type checking, value validation, “on-change” callbacks, dynamic default value generation. This can ensure that a configuration stays valid. Refer to the traitlets documentation for more details on how to use these features.
Tip
It is possible to only show subsections and configurable traits in
autocompletion. Set the class attribute
Section._attr_completion_only_traits to True.
Sections also implements the interface of a
MutableMapping and most of the interface of a
dict. Parameters can be accessed with a single key of dot-separated
attributes. This still benefits from all features of traitlets.
app["some.deeply.nested.trait"] = 2
# or
app["some"]["deeply.nested.trait"] = 2
By default Section.keys(), Section.values() and
Section.items() do not list subsections objects or aliases, but this
can be altered. They also return flat output; to obtain a nested dictionnary
pass nest=True.
Important
The omission of subsections and aliases is done to allow a straightforward
conversion with dict(section). Similarly, len and iter do not
account for subsections and aliases.
However, other methods such as “get”, “set” and “contains” will allow subsections keys and aliases:
>>> "subsection" in section
True
>>> section["subsection"] # No KeyError
Sections have an update() method allowing to modify it with a
mapping of several parameters (or another section instance):
app.update({"computation.n_cores": 10, "physical.threshold": 5.})
Similarly to Section.setdefault(), it can add new traits to the section
with some specific input, see the docstring for details.
Warning
Adding traits to a Section instance (via add_trait(),
update(), or setdefault()) internally creates a
new class and modifies in-place the section instance; something along
the lines of:
section.__class__ = type("NewClass", (section.__class__), ...)
References to section classes necessary to operate the nested structure are updated accordingly, but this is a possibly dangerous operation and it would be preferred to set traits statically.
Obtaining subsets of all parameters#
Using Section.select() we can select only some of the parameters by name:
>>> app.select("physical.threshold", "computation.n_cores")
{
"physical.threshold": 2.5,
"computation.n_cores": 1
}
Some parameters may be destined for a specific function. It is possible to select those by name as shown above, or one could tag the target traits during definition like so:
some_parameter = Bool(True).tag(for_this_function=True)
These traits can then automatically be retrieved using the metadata argument
of many methods such as keys() or select().
Section.trait_values_from_func_signature() will find the parameters that
share the same name as arguments from a function signature.
Input parameters#
The ApplicationBase class allows to retrieve the values of parameters
from configuration files or from command line arguments (CLI), when
ApplicationBase.start() is launched. It first parses command line
arguments (unless deactivated) and then reads values from specified
configuration files. Each time parameters are loaded from any kind of source,
the parameters for the application object are immediately applied to it, since
they can alter the rest of the process.
The configuration values are retrieved by ConfigLoader objects adapted
for each source. Its output will be a flat dictionary mapping keys to a
ConfigValue. Aliases are expanded so that each key is unique.
Note
The ConfigValue class allows to store more information about the
value: its origin, the original string and parsed value if applicable, and a
priority value used when merging configs. To obtain the value, use
ConfigValue.get_value().
Parameters obtained from configuration files and from CLI are merged.
Parameters are stored in file_conf,
cli_conf and conf.
Finally, the application will recursively instantiate all sections while passing the configuration values. Unspecified values will take the trait default value. All values will undergo validation from traitlets.
From configuration files#
The application can retrieve parameters from configuration files by invoking
ApplicationBase.load_config_files(). It will load the file (or files)
specified in ApplicationBase.config_files. If multiple files are
specified, the parameter from one file will replace those from the previous
files in the list. The resulting configuration will be stored in the
file_conf attribute.
Note
The config_files attribute is a trait, which allows
to select configuration files from the command line. To specify it from your
script use:
class App(ApplicationBase):
pass
App.config_files.default_value = ...
or if you do not need to change the value using command line arguments:
class App(ApplicationBase):
config_files = ...
Different file formats require specific subclasses of FileLoader. A
loader is selected by looking at the config file extension. As some loaders have
external dependencies, loaders are only imported when needed, according to the
import string in ApplicationBase.file_loaders.
File extensions |
Class |
Library |
|---|---|---|
toml |
||
py, ipy |
||
yaml, yml |
||
json |
File loaders can implement FileLoader.write() to generate a valid
configuration file of the corresponding format, following the values present in
its config attribute. This allows to generate lengthy
configuration files, with different amounts of additional information in
comments. The end user can simply use ApplicationBase.write_config()
which automatically deals with an existing configuration file that may need to
be updated, while keeping its current values (or not).
This package supports and recommends TOML configuration
files. It is both easily readable and unambiguous. Despite allowing nested
configuration, it can be written without indentation, allowing to add long
comments for each parameters. The tomllib builtin module
does not support writing, so we use (for both reading and writing) one of the
recommended replacement: tomlkit.
The package also support python scripts as configuration files, similarly to how
traitlets is doing it. To load a configuration file, the file loader
PyLoader creates a PyConfigContainer object. That object
will be bound to the c variable in the script/configuration file. It allows
arbitrarily nested attribute setting so that the following syntax is valid:
c.section.subsection.parameter = 5
Important
Remember that this script will be executed, so arbitrary code can be run inside, maybe changing some value depending on the OS, the hostname, or more advanced logic.
Of course running arbitrary code dynamically is a security liability, do not load parameters from a python script unless you trust it.
The loader do not support the traitlets feature of configuration file
inheritance via (in the config file) load_subconfig("some_other_script.py").
This would be doable, but for the moment we recommend instead that you specify
multiple configuration files in ApplicationBase.config_files,
remembering that each configuration file replaces the values of the previous one
in the list.
Yaml is supported via YamlLoader and the
third-party module ruamel.ymal.
Despite not being easily readable, the JSON format is also supported via
JsonLoader and the builtin module json. The
decoder and encoder class can be customized.
From the command line#
Parameters can be set from parsing command line arguments, although it can be
skipped by either setting the ApplicationBase.ignore_cli attribute or
the ignore_cli argument to ApplicationBase.start(). The configuration
obtained will be stored in the cli_conf attribute and
will take priority over parameters from configuration files.
The keys are indicated following one or two hyphen. Any subsequent hyphen is
replaced by an underscore. So -computation.n_cores and
--computation.n-cores are equivalent. Parameters keys are dot-separated
paths leading to a trait. Aliases can be used for brevity.
Note
This can be changed with attributes of the corresponding loader class:
CLILoader.allow_kebab and CLILoader.prefix.
Command line arguments need to be parsed. The corresponding trait object
will deal with the parsing, using its from_string or from_string_list
(for containers) methods.
Note
Nested containers parameters (list of list e.g.) are not currently supported.
Note
The list of command line arguments is obtained by
ApplicationBase.get_argv(). It tries to detect if python was launched
from IPython or Jupyter, in which case it ignores the arguments before
the first --.
List arguments#
For any and every parameter, the argument action is
“append”, with type str (since the parsing is left to traitlets), and
nargs="*" meaning that any parameter can receive any number of values. To
indicate multiple values, for a List trait for instance, the following syntax is
to be used:
--physical.years 2015 2016 2017
and not as is the case with vanilla traitlets:
--physical.years 2015 --physical.years 2016 ...
This will raise an error since duplicate are forbidden to avoid possible mistakes in user input.
Extra parameters#
Extra parameters to the argument parser can be added with the class method
ApplicationBase.add_extra_parameters(). This will add traits to a section
named “extra”, created if needed. This is useful when needing parameters for a
single script for instance. If in our script we write:
App.add_extra_parameters(threshold=Float(5.0))
we can then pass a parameter by command line at --extra.threshold and
retrieve it with app.extra.threshold.
Autocompletion#
Autocompletion for parameters is available via argcomplete. Install argcomplete and either
register the scripts you need or activate global completion. In both cases you
will need to add # PYTHON_ARGCOMPLETE_OK to the beginning of your scripts.
Note
Completion is not available when using ipython, as it shadows our application. I do not know if this is fixable.
From a dictionary#
The loader DictLoader can transform any nested mapping into a proper
configuration.
Note
The loaders TomlkitLoader, YamlLoader and
JsonLoader are based on it, as they return a nested mapping.