Usage#

Specifying parameters#

Traits#

The configuration is specified through Section classes. Each section contains parameters in the form of class attribute of type traitlets.TraitType (for instance Float, Unicode, or List), or other (nested) sections.

Note

Traits can be confusing at first. They are a sort of descriptor. A container class has instances of traits bound as class attribute. For instance:

class Container(Section):
    name = Float(default_value=1.)

From this we can access the trait instance with Container.name, but it only contains the information used for its definition, it does not hold any actual value.

But if we create an instance of the container, when we access name we will obtain a value, not a trait:

>>> c = Container()
>>> type(c.name)
float
>>> c.name = 2  # we can change it
>>> c.name
2.0

It behaves nearly like a typical float attribute. When we change the value for instance, the trait (again which is a class attribute) will be used to validate the new value, or do some more advanced things. But the value is tied to the container instance c.

Here are some of the basic traits types:

Int, Float, Bool

Unicode

For strings. Traitlets differentiates unicode and bytes strings.

List, Set,

Containers can check the element type: List(Float())

Tuple

To check type, Tuple must specify every element: Tuple(Int(), Unicode())

Dict

Dict can specify both key and values: Dict(key_trait=Unicode(),value_trait=Int())

Enum

Must be one of the specified values: Enum(["a", "b"], default_value="a")

Union

Multiple types are permitted. Will try to convert in the order they are specified. For instance, always use this order: Union([Float(), Int()], otherwise floats will be truncated.

Type

Type(klass=MyClass) will allow subclasses of MyClass. In your configuration files you can use an import string (“my_module.MyClass”).

Instance

This is currently unsupported.

The packages provides two new of types traits. RangeTrait is a list of integers or floats that can be parsed from a slice specification in the form start:stop[:step]. ‘stop’ is inclusive. It can still take in lists of values normally (--year 2002 2005 2006).

  • With year = RangeTrait(Int()), --year=2002:2004 will be parsed as [2002, 2003, 2004]

  • With coef = RangeTrait(Float()), --coef=0:1:0.5 will be parsed as [0.0, 0.5, 1.0].

FixableTrait is meant to work with filefinder, for parameters defined in filename patterns. It can take

  • a single value

  • a string that will be interpreted as a range of values if the trait type allows it (Int or Float)

  • a string that will be interpreted as a regular expression (this is disabled by default as it can be dangerous: any value from command line that cannot be parsed would still be allowed).

  • a list of values

Subsections#

A section can contain other sub-sections, allowing a tree-like, nested configuration. It can be done by in two ways:

  • Subsections can be defined directly inside another section class definition. The name of such a nested class will be used for the corresponding subsection attribute. The class definition will be renamed and moved under the attribute _{name}SectionDef. For example:

    class MyConfig(Section):
    
        class log(Section):
            level = Unicode("INFO")
    
        class sst(Section):
            dataset = Enum(["a", "b"])
    
            class a(Section):
                location = Unicode("/somewhere")
                time_resolution = Int(8, help="in days")
    
            class b(Section):
                location = Unicode("/somewhere/else")
    
    MyConfig().sst.a.location = "/fantastic"
    

    A mypy plugin is provided to support these dynamic definitions. Add it to the list of plugins in your mypy configuration file, for instance in ‘pyproject.toml’:

    [mypy]
    plugins = ['data_assistant.config.mypy_plugin']
    
  • A more standard way is by using the Subsection class and setting it as an attribute in the parent section:

    from data_assistant.config import Subsection
    
    class ChildSection(Section):
        param_b = Int(1)
    
    class ParentSection(Section):
        param_a = Int(1)
    
        child = Subsection(ChildSection)
    

    In the example above we have two parameters available at param_a and child.param_b.

Important

Like traits, Subsections are also descriptors: accessing ParentSection().child returns a ChildSection instance.

Aliases#

It is possible to define aliases with the Section.aliases attribute. It is a mapping of shortcut names to a deeper subsection:

{"short": "some.deeply.nested.subsection"}

Application#

The principal section, at the root of the configuration tree, is the Application. As a subclass of Section, it can hold directly all your parameters and nested subsections. It will also be responsible for gathering the parameters from configuration files and the command line, and more.

Here is a simple example:

from data_assistant.config import ApplicationBase, Section
from traitlets import Bool, Float, Int, List, Unicode


class App(ApplicationBase):

   class computation(Section):
       parallel = Bool(False, help="Conduct computation in parallel if true.")
       n_cores = Int(1, help="Number of cores to use for computation.")

   class physical(Section):
       threshold = Float(2.5, help="Threshold for some computation.")
       data_name = Unicode("SST")
       years = List(
           Int(),
           default_value=[2000, 2001, 2008],
           min_length=1,
           help="Years to do the computation on."
       )

>>> app = App()
>>> app.physical.years = [2023, 2024]

Starting the application#

By default, when the application is instantiated it executes its starting sequence with the start() method. It will:

  • Parse command line arguments

  • Read parameters from configuration files

  • Instantiate all subsections with the obtained parameters

This can be controlled with __init__ arguments start, ignore_cli, and instantiate.

Note

Even though some features are still available if the subsections are not instantiated (since the subsections classes contain information about the parameters), instantiating them is necessary to fully validate the parameters.

Logging#

The base application contains some parameters to easily log information. A logger instance is available at ApplicationBase.log that will log to the console (stderr), and can be configured via the (trait) parameters log_level, log_format, and log_datefmt.

The configuration of the logging setup is kept minimal. Users needing to configure it further may look into ApplicationBase._get_logging_config().

Note

The logger will have the application class fullname (module + class name), so logging inheritance rules will apply.

Accessing parameters#

As explained above, the value of parameters can be accessed (or changed) just like attributes of the section that contains them. This allows for deeply nested access:

app.some.deeply.nested.trait = 2

Note

This benefits from the features of traitlets: type checking, value validation, “on-change” callbacks, dynamic default value generation. This can ensure that a configuration stays valid. Refer to the traitlets documentation for more details on how to use these features.

Tip

It is possible to only show subsections and configurable traits in autocompletion. Set the class attribute Section._attr_completion_only_traits to True.

Sections also implements the interface of a MutableMapping and most of the interface of a dict. Parameters can be accessed with a single key of dot-separated attributes. This still benefits from all features of traitlets.

app["some.deeply.nested.trait"] = 2
# or
app["some"]["deeply.nested.trait"] = 2

By default Section.keys(), Section.values() and Section.items() do not list subsections objects or aliases, but this can be altered. They also return flat output; to obtain a nested dictionnary pass nest=True.

Important

The omission of subsections and aliases is done to allow a straightforward conversion with dict(section). Similarly, len and iter do not account for subsections and aliases.

However, other methods such as “get”, “set” and “contains” will allow subsections keys and aliases:

>>> "subsection" in section
True
>>> section["subsection"]  # No KeyError

Sections have an update() method allowing to modify it with a mapping of several parameters (or another section instance):

app.update({"computation.n_cores": 10, "physical.threshold": 5.})

Similarly to Section.setdefault(), it can add new traits to the section with some specific input, see the docstring for details.

Warning

Adding traits to a Section instance (via add_trait(), update(), or setdefault()) internally creates a new class and modifies in-place the section instance; something along the lines of:

section.__class__ = type("NewClass", (section.__class__), ...)

References to section classes necessary to operate the nested structure are updated accordingly, but this is a possibly dangerous operation and it would be preferred to set traits statically.

Obtaining subsets of all parameters#

Using Section.select() we can select only some of the parameters by name:

>>> app.select("physical.threshold", "computation.n_cores")
{
    "physical.threshold": 2.5,
    "computation.n_cores": 1
}

Some parameters may be destined for a specific function. It is possible to select those by name as shown above, or one could tag the target traits during definition like so:

some_parameter = Bool(True).tag(for_this_function=True)

These traits can then automatically be retrieved using the metadata argument of many methods such as keys() or select().

Section.trait_values_from_func_signature() will find the parameters that share the same name as arguments from a function signature.

Input parameters#

The ApplicationBase class allows to retrieve the values of parameters from configuration files or from command line arguments (CLI), when ApplicationBase.start() is launched. It first parses command line arguments (unless deactivated) and then reads values from specified configuration files. Each time parameters are loaded from any kind of source, the parameters for the application object are immediately applied to it, since they can alter the rest of the process.

The configuration values are retrieved by ConfigLoader objects adapted for each source. Its output will be a flat dictionary mapping keys to a ConfigValue. Aliases are expanded so that each key is unique.

Note

The ConfigValue class allows to store more information about the value: its origin, the original string and parsed value if applicable, and a priority value used when merging configs. To obtain the value, use ConfigValue.get_value().

Parameters obtained from configuration files and from CLI are merged. Parameters are stored in file_conf, cli_conf and conf.

Finally, the application will recursively instantiate all sections while passing the configuration values. Unspecified values will take the trait default value. All values will undergo validation from traitlets.

From configuration files#

The application can retrieve parameters from configuration files by invoking ApplicationBase.load_config_files(). It will load the file (or files) specified in ApplicationBase.config_files. If multiple files are specified, the parameter from one file will replace those from the previous files in the list. The resulting configuration will be stored in the file_conf attribute.

Note

The config_files attribute is a trait, which allows to select configuration files from the command line. To specify it from your script use:

class App(ApplicationBase):
    pass

App.config_files.default_value = ...

or if you do not need to change the value using command line arguments:

class App(ApplicationBase):
    config_files = ...

Different file formats require specific subclasses of FileLoader. A loader is selected by looking at the config file extension. As some loaders have external dependencies, loaders are only imported when needed, according to the import string in ApplicationBase.file_loaders.

File extensions

Class

Library

toml

toml.TomlkitLoader

tomlkit

py, ipy

python.PyLoader

yaml, yml

yaml.YamlLoader

ruamel

json

json.JsonLoader

json

File loaders can implement FileLoader.write() to generate a valid configuration file of the corresponding format, following the values present in its config attribute. This allows to generate lengthy configuration files, with different amounts of additional information in comments. The end user can simply use ApplicationBase.write_config() which automatically deals with an existing configuration file that may need to be updated, while keeping its current values (or not).

This package supports and recommends TOML configuration files. It is both easily readable and unambiguous. Despite allowing nested configuration, it can be written without indentation, allowing to add long comments for each parameters. The tomllib builtin module does not support writing, so we use (for both reading and writing) one of the recommended replacement: tomlkit.

The package also support python scripts as configuration files, similarly to how traitlets is doing it. To load a configuration file, the file loader PyLoader creates a PyConfigContainer object. That object will be bound to the c variable in the script/configuration file. It allows arbitrarily nested attribute setting so that the following syntax is valid:

c.section.subsection.parameter = 5

Important

Remember that this script will be executed, so arbitrary code can be run inside, maybe changing some value depending on the OS, the hostname, or more advanced logic.

Of course running arbitrary code dynamically is a security liability, do not load parameters from a python script unless you trust it.

The loader do not support the traitlets feature of configuration file inheritance via (in the config file) load_subconfig("some_other_script.py"). This would be doable, but for the moment we recommend instead that you specify multiple configuration files in ApplicationBase.config_files, remembering that each configuration file replaces the values of the previous one in the list.

Yaml is supported via YamlLoader and the third-party module ruamel.ymal.

Despite not being easily readable, the JSON format is also supported via JsonLoader and the builtin module json. The decoder and encoder class can be customized.

From the command line#

Parameters can be set from parsing command line arguments, although it can be skipped by either setting the ApplicationBase.ignore_cli attribute or the ignore_cli argument to ApplicationBase.start(). The configuration obtained will be stored in the cli_conf attribute and will take priority over parameters from configuration files.

The keys are indicated following one or two hyphen. Any subsequent hyphen is replaced by an underscore. So -computation.n_cores and --computation.n-cores are equivalent. Parameters keys are dot-separated paths leading to a trait. Aliases can be used for brevity.

Note

This can be changed with attributes of the corresponding loader class: CLILoader.allow_kebab and CLILoader.prefix.

Command line arguments need to be parsed. The corresponding trait object will deal with the parsing, using its from_string or from_string_list (for containers) methods.

Note

Nested containers parameters (list of list e.g.) are not currently supported.

Note

The list of command line arguments is obtained by ApplicationBase.get_argv(). It tries to detect if python was launched from IPython or Jupyter, in which case it ignores the arguments before the first --.

List arguments#

For any and every parameter, the argument action is “append”, with type str (since the parsing is left to traitlets), and nargs="*" meaning that any parameter can receive any number of values. To indicate multiple values, for a List trait for instance, the following syntax is to be used:

--physical.years 2015 2016 2017

and not as is the case with vanilla traitlets:

--physical.years 2015 --physical.years 2016 ...

This will raise an error since duplicate are forbidden to avoid possible mistakes in user input.

Extra parameters#

Extra parameters to the argument parser can be added with the class method ApplicationBase.add_extra_parameters(). This will add traits to a section named “extra”, created if needed. This is useful when needing parameters for a single script for instance. If in our script we write:

App.add_extra_parameters(threshold=Float(5.0))

we can then pass a parameter by command line at --extra.threshold and retrieve it with app.extra.threshold.

Autocompletion#

Autocompletion for parameters is available via argcomplete. Install argcomplete and either register the scripts you need or activate global completion. In both cases you will need to add # PYTHON_ARGCOMPLETE_OK to the beginning of your scripts.

Note

Completion is not available when using ipython, as it shadows our application. I do not know if this is fixable.

From a dictionary#

The loader DictLoader can transform any nested mapping into a proper configuration.

Note

The loaders TomlkitLoader, YamlLoader and JsonLoader are based on it, as they return a nested mapping.