WriterAbstract#

class WriterAbstract(params=None, **kwargs)#

Bases: Generic[T_Source, T_Data], Module

Abstract class of Writer plugin.

Manages metadata to (eventually) add to data before writing.

Parameters:

params (t.Any | None)

__weakref__#

list of weak references to the object

add_git_metadata(script, meta)#

Add git information to meta dictionary.

Parameters:
check_directories(calls)#

Check if directories are missing, and create them if necessary.

Parameters:

calls (Sequence[tuple[T_Source, T_Data]])

check_directory(call)#

Check if directory is missing, and create it if necessary.

Parameters:

call (tuple[T_Source, T_Data])

check_overwriting_calls(calls)#

Check if some calls have the same filename.

Parameters:

calls (Sequence[tuple[T_Source, T_Data]])

get_metadata(add_dataset_params=True, add_commit=True)#

Set some dataset attributes with information on how it was created.

Attributes are:

  • written_as_dataset: name of dataset class.

  • created_by: hostname and filename of the python script used

  • created_with_params: a string representing the parameters,

  • created_on: date of creation

  • created_at_commit: if found, the HEAD commit hash.

  • git_diff_short: if workdir is dirty, a list of modified files

  • git_diff_long: if workdir is dirty, the full diff (truncated) at metadata_max_diff_lines.

Parameters:
  • add_dataset_params (bool) – If True (default), add the parent dataset parameters values to metadata. Parameters “as dict” are serialized using json, and if that fails str().

  • add_commit (bool) – If True (default), try to find the current commit hash of the directory containing the script called.

Return type:

dict[str, Any]

metadata_git_ignore: abc.Sequence[str] = []#

Files and folders to ignore when creating git diff.

metadata_max_diff_lines = 30#

Maximum number of lines to include in diff.

metadata_params_exclude: abc.Sequence[str] = ['dask.', 'log_']#

Prefixes of parameters to exclude from metadata attribute.

send_calls(calls, **kwargs)#

Send multiple calls serially.

Check beforehand if there are filename conflicts betwen calls, and make sure the necessary (sub)directories are created if they not exist already.

Parameters:
Return type:

list[Any]

send_single_call(call, **kwargs)#

Execute a single call.

Not implemented:

implement in plugin subclass.

Parameters:
Return type:

Any

write(data, target=None, **kwargs)#

Write data to file or store.

Not implemented:

implement in plugin subclass.

Parameters:
Return type:

Any