Using HDF5 files

https://en.wikipedia.org/wiki/Hierarchical_Data_Format

https://www.h5py.org/
https://docs.h5py.org/en/stable/
HDF5 for Python

https://www.pytables.org/

Andrew Collette, Python and HDF5, O'Reilly, 2013.

INTRODUCTION

Hierarchical Data Format (HDF) is a set of file formats (HDF4, HDF5) designed to store and organize large amounts of data (typically bigger than available memory). HDF5 uses two major types of object:
(1) Datasets, which are multidimensional arrays of a homogeneous type [similar to 'files'],
(2) Groups, which are container structures which can hold datasets and other groups [similar to 'folders'].

H5PY

The h5py package is a Pythonic interface to the HDF5 binary data format.

Groups work like dictionaries, and datasets work like NumPy arrays.

Install

ANACONDA
conda install h5py

PIP (in a virtual environment)
pip install h5py

import h5py

f = h5py.File('mytestfile.hdf5', mode='r')
list(f.keys())   # ['mydataset']
list(f.attrs.keys())   # metadata for the group
dset = f['mydataset']   # get an HDF5 dataset
dset.shape
dset.dtype
list(dset.attrs.keys())   # metadata for the dataset
#f.flush()   # flush the buffers (when mode="w")
f.close()

PYTABLES

PyTables is a package for managing hierarchical datasets and designed to efficiently and easily cope with extremely large amounts of data. The foundation of the underlying hierarchical data organization is the HDF5 library.

The package 'pandas' can import from and export to HDF5 via PyTables [pandas.HDFStore].

Install

ANACONDA
conda install pytables

PIP (in a virtual environment)
python3 -m pip install tables

APT (for the entire computer)
Debian packages: python-tables, python-tables-data, and dependencies.

import tables

f = tables.open_file("mytestfile.hdf5", mode="r")

#f.flush()   # flush the buffers (when mode="w")
f.close()