XarrayWriter#

class XarrayWriter(params=None, **kwargs)#

Bases: WriterAbstract[str, Dataset]

Write Xarray dataset.

Parameters:

params (t.Any | None)

TO_NETCDF_KWARGS: dict[str, t.Any] = {}#

Arguments passed to the function writing files.

TO_ZARR_KWARGS: dict[str, t.Any] = {}#

Arguments passed to the writing function for zarr stores.

add_metadata(ds, add_dataset_parameters=True, add_commit=True)#

Set some dataset attributes with information on how it was created.

Wrapper around get_metadata(). Attributes already present will not be overwritten.

Parameters:
  • ds (Dataset) – Dataset to add global attributes to. This is not done in-place.

  • add_dataset_params – Add the parent dataset parameters values to serialization if True (default) and if parameters is not a string. The parent parameters won’t overwrite the values of parameters.

  • add_commit (bool) – If True (default), try to find the current commit hash of the directory containing the script called.

  • add_dataset_parameters (bool)

Return type:

Dataset

send_calls_together(calls, client, chop=None, format=None, **kwargs)#

Send multiple calls together.

If Dask is correctly configured, the writing calls will be executed in parallel.

For all calls within a group, a list of delayed writing calls is constructed. It is then computed all at once using client.compute(delayed), however to avoid lingering results (because we only care about the side-effect of writing to file, not the computed result), we get rid of ‘futures’ as soon as completed. This avoid a blow up in memory:

for future in distributed.as_completed(client.compute(delayed)):
    log.debug("future completed: %s", future)
Parameters:
  • client (Client) – Dask Client instance.

  • chop (int | None) – If None (default), all calls are sent together. If chop is an integer, groups of calls of size chop (at most) will be sent one after the other, calls within each group being run in parallel.

  • kwargs – Passed to writing function. Overwrites the defaults from TO_NETCDF_KWARGS, whatever the value of function is.

  • calls (abc.Sequence[CallXr])

  • format (t.Literal['nc', 'zarr', None])

send_single_call(call: CallXr, format: Literal[None] = None, compute: Literal[True] = True, **kwargs) None | BaseStore#
send_single_call(call: CallXr, format: Literal['nc'], compute: Literal[True], **kwargs) None
send_single_call(call: CallXr, format: Literal['zarr'], compute: Literal[True], **kwargs) BaseStore
send_single_call(call: CallXr, *, compute: Literal[False], **kwargs) Delayed

Execute a single call.

Parameters:
  • kwargs – Passed to the writing function.

  • call (CallXr)

  • format (t.Literal['nc', 'zarr', None])

  • compute (bool)

Return type:

Delayed | None | BaseStore

write(data, target=None, client=None, **kwargs)#

Write datasets to multiple targets.

Each dataset is written to its corresponding target (filename or store location). Directories will automatically be created if necessary. Metadata is added to the dataset.

Parameters:
  • data (xr.Dataset | abc.Sequence[xr.Dataset]) – Dataset or Sequence of datasets to write.

  • target (str | abc.Sequence[str] | None) – If None (default), target location(s) are automatically obtained via DataManagerBase.get_source().

  • client (Client | None) – Dask distributed.Client instance. If present multiple write calls will be send in parallel. See send_calls_together() for details. If left to None, the write calls will be sent serially.

  • kwargs – Passed to the function that writes to disk (xarray.Dataset.to_netcdf() or xarray.Dataset.to_zarr()).

Return type:

t.Any