register#

Registering datasets.

Various possible architectures.

Single file#

All of your DataManager classes are defined in a single file/module. You can then import any one of them from that module or the store. No special issue.

Separated files, no store#

You can separate different DataManagers in different modules for any number of reason. This is efficient since you import files only as you need them.

But if you want to use the store, it might get more complicated since you may quickly run in circular imports. Also each module needs te be imported for the store to register the datasets in them.

Separated files, with store#

To avoid these problems, I propose a three part structure:

  • A first module, let’s call it data. It will define the store variable, and eventually some project wide stuff, like a project-default DataManager class so all datasets will have common functions.

  • Any number of other modules that will define all the datasets needed. They can be placed anywhere, in submodules, even outside the project if you’re feeling daring. They import anything they need from data, and the store object, which they use to register.

  • Now if we want to use the store, we need to actually import those modules to register the datasets inside. Otherwise the store will not know about them. One way to do it is to define a third module that will import all the datasets, as well as the store, let’s call it datalist. It can written like so:

    import sst
    import histogram.dataset
    from data import store
    ...
    

Once this structure is in place, you can simply import the store from datalist.

Note that this has the disadvantage of importing all module data, which might add some overhead in some cases for datasets that might not be used. The store also “hides” the type of the DataManager you get, which can be annoying when using static type checking.

Classes

DatasetStore(*args)

Mapping of registered Datasets.