Transparent optimized reading of n-dimensional Blosc2 slices for h5py
Project description
b2h5py provides h5py with transparent, automatic optimized reading of n-dimensional slices of Blosc2-compressed datasets. This optimized slicing leverages direct chunk access (skipping the slow HDF5 filter pipeline) and 2-level partitioning into chunks and then smaller blocks (so that less data is actually decompressed).
Benchmarks of this technique show 2x-5x speed-ups compared with normal filter-based access. Comparable results are obtained with a similar technique in PyTables, see Optimized Hyper-slicing in PyTables with Blosc2 NDim.
Usage
This optimized access works for slices with step 1 on Blosc2-compressed datasets using the native byte order. It is enabled by monkey-patching the h5py.Dataset class to extend the slicing operation. This is done on module import, so the only thing you need to do is:
import b2h5py
After that, optimization will be attempted for any slicing of a dataset (of the form dataset[...] or dataset.__getitem__(...)). If the optimization is not possible in a particular case, normal h5py slicing code will be used (which performs HDF5 filter-based access, backed by hdf5plugin to support Blosc2).
Even if the module is imported and the Dataset class is patched, you may still force-disable the optimization by setting BLOSC2_FILTER=1 in the environment.
Building
Just install PyPA build (e.g. pip install build), enter the source code directory and run pyproject-build to get a source tarball and a wheel under the dist directory.
Installing
Either install the wheel from the previous section, or enter the source code directory and run pip install . from there. There are no published wheels yet.
Running tests
If you have installed b2h5py, just run python -m unittest discover b2h5py.tests.
Otherwise, just enter its source code directory and run python -m unittest.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.