No project description provided
Project description
dask-databricks
Cluster tools for running Dask on Databricks multi-node clusters.
Quickstart
To launch a Dask cluster on Databricks you need to create an init script with the following contents and configure your multi-node cluster to use it.
#!/bin/bash
# Install Dask + Dask Databricks
/databricks/python/bin/pip install --upgrade dask[complete] git+https://github.com/jacobtomlinson/dask-databricks.git@main
# Start Dask cluster components
dask databricks run
Then from your Databricks Notebook you can quickly connect a Dask Client
to the scheduler running on the Spark Driver Node.
import dask_databricks
client = dask_databricks.get_client()
Now you can submit work from your notebook to the multi-node Dask cluster.
def inc(x):
return x + 1
x = client.submit(inc, 10)
x.result()
Dashboard
You can access the Dask dashboard via the Databricks driver-node proxy. The link can be found in Client
or DatabricksCluster
repr or via client.dashboard_link
.
>>> print(client.dashboard_link)
https://dbc-dp-xxxx.cloud.databricks.com/driver-proxy/o/xxxx/xx-xxx-xxxx/8087/status
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
dask_databricks-0.2.0.tar.gz
(8.5 kB
view hashes)
Built Distribution
Close
Hashes for dask_databricks-0.2.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c8a48a3c5b08e7fb263258f6612c1717aeded991adaac90ef763a850a3c62786 |
|
MD5 | 81ad2548b07aac1906ea4c4b8862fbf1 |
|
BLAKE2b-256 | 69f4ff94e7fcac0ce18e15677a768b31cc51719d4849d2cf0db41ae6cef767e3 |