https://en.wikipedia.org/wiki/Hierarchical_Data_Format
https://www.h5py.org/
https://docs.h5py.org/en/stable/
HDF5 for Python
https://www.pytables.org/
Andrew Collette, Python and HDF5, O'Reilly, 2013.
Hierarchical Data Format (HDF) is a set of file formats (HDF4, HDF5)
designed to store and organize large amounts of data
(typically bigger than available memory).
HDF5 uses two major types of object:
(1) Datasets, which are multidimensional arrays of a homogeneous type
[similar to 'files'],
(2) Groups, which are container structures which can hold datasets
and other groups [similar to 'folders'].
The h5py package is a Pythonic interface to the HDF5 binary data format.
Groups work like dictionaries, and datasets work like NumPy arrays.
Install ANACONDA conda install h5py PIP (in a virtual environment) pip install h5py
import h5py f = h5py.File('mytestfile.hdf5', mode='r') list(f.keys()) # ['mydataset'] list(f.attrs.keys()) # metadata for the group dset = f['mydataset'] # get an HDF5 dataset dset.shape dset.dtype list(dset.attrs.keys()) # metadata for the dataset #f.flush() # flush the buffers (when mode="w") f.close()
PyTables is a package for managing hierarchical datasets and designed to efficiently and easily cope with extremely large amounts of data. The foundation of the underlying hierarchical data organization is the HDF5 library.
The package 'pandas' can import from and export to HDF5 via PyTables [pandas.HDFStore].
Install ANACONDA conda install pytables PIP (in a virtual environment) python3 -m pip install tables APT (for the entire computer) Debian packages: python-tables, python-tables-data, and dependencies.
import tables f = tables.open_file("mytestfile.hdf5", mode="r") #f.flush() # flush the buffers (when mode="w") f.close()