The generic class that supports data storage is called Storage. In
order to be able to use it in connection with Dataset, any child of
this class must implement methods load_table() and
determine_field_names(), optionally a method
write_dataset(). Method load_table() takes as arguments
the name of the table as table_name, and optional a list of attributes
as column_names, a parameter lowercase for forcing all names to lowercase
, and an id_name to sort the loaded data
which is currently only supported by SQL-based storages, and returns
a dictionary with the column names as keys. This method is called from the
Dataset method load_dataset().
Method determine_field_names() takes load_resources as
argument and returns a list of attribute names that were found on the storage.
Method write_dataset() writes data into the given place. It takes as
an argument an object of class Resources, called
write_resources. Dataset calls this method from its method
write_dataset(), where the write_resources are automatically
filled with entries attributes (a list of attribute names to be written),
out_table_name (table/directory/file where the data should be written),
values (a dictionary with one entry per attribute where values are arrays
of the data), id_name (name of the unique identifier), attrtype (a
dictionary giving the type for each attribute), valuetypes (a dictionary
of numpy array types, such as float32, int16).
The predefined storage classes in opus_core are dict_storage,
tab_storage, mysql_storage, flt_storage, implemented
in modules of the same name in opus_core.store. Their instances can be
created using the method get_storage(type, resources,...) of class
StorageFactory. It passes resources (of type Resources)
to the constructor of the class given by type. A wrapper around
get_storage(), method build_storage_for_dataset(location,
type, ...) creates resources and puts in it an entry
storage_location (with the value of location). This entry is used
by most of the predefined storage classes.