Dask + Delta Table
Project description
Dask Deltatable Reader
Reads a Delta Table from directory using Dask engine.
To Try out the package:
pip install dask-deltatable
Features:
- Reads the parquet files based on delta logs parallely using dask engine
- Supports all three filesystem like s3, azurefs, gcsfs
- Supports some delta features like
- Time Travel
- Schema evolution
- parquet filters
- row filter
- partition filter
- Query Delta commit info - History
- vacuum the old/ unused parquet files
- load different versions of data using datetime.
Usage:
import dask_deltatable as ddt
# read delta table
ddt.read_delta_table("delta_path")
# read delta table for specific version
ddt.read_delta_table("delta_path",version=3)
# read delta table for specific datetime
ddt.read_delta_table("delta_path",datetime="2018-12-19T16:39:57-08:00")
# read delta complete history
ddt.read_delta_history("delta_path")
# read delta history upto given limit
ddt.read_delta_history("delta_path",limit=5)
# read delta history to delete the files
ddt.vacuum("delta_path",dry_run=False)
# Can read from S3,azure,gcfs etc.
ddt.read_delta_table("s3://bucket_name/delta_path",version=3)
# please ensure the credentials are properly configured as environment variable or
# configured as in ~/.aws/credential
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
dask-deltatable-0.2.tar.gz
(4.8 kB
view hashes)
Built Distribution
Close
Hashes for dask_deltatable-0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b5b56a57ec6623829cdd0fae8b744f1afc638e0bf6d1fc6d26945e92867a9f82 |
|
MD5 | 1db09bf405a4031e3c459eb65ad0c3b5 |
|
BLAKE2b-256 | c2638d9505c174c00e8d2a831e8ee8494e8bafd31eb1732fa6655ca08b314b4b |