Monorepo for Databricks Zerobus Ingest SDKs.
GA: This SDK is generally available and supported for production use cases. Minor and patch version updates will not contain breaking changes. Major version updates may include breaking changes.
We are keen to hear feedback from you. Please file issues, and we will address them.
Zerobus is a high-throughput streaming service for direct data ingestion into Databricks Delta tables, optimized for real-time data pipelines and high-volume workloads.
| Language | Directory | Package |
|---|---|---|
| Rust | rust/ |
databricks-zerobus-ingest-sdk |
| Python | python/ |
databricks-zerobus-ingest-sdk |
| Go | go/ |
github.com/databricks/zerobus-sdk/go |
| TypeScript | typescript/ |
@databricks/zerobus-ingest-sdk |
| Java | java/ |
com.databricks:zerobus-ingest-sdk |
We try to provide prebuilt native binaries for the following platforms:
| Platform | Architecture |
|---|---|
| Linux | x86_64 |
| Linux | aarch64 |
| Windows | x86_64 |
| macOS | x86_64 |
| macOS | aarch64 (Apple Silicon) |
Note: We do not currently have macOS CI runners, so macOS binaries are built locally and may not be available for every SDK or release. If your platform is not supported or you encounter compatibility issues, you can build from source or file an issue.
Before using any SDK, you need the following:
After logging into your Databricks workspace, look at the browser URL:
https://<databricks-instance>.cloud.databricks.com/o=<workspace-id>
- Workspace URL: The part before
/o=(e.g.,https://dbc-a1b2c3d4-e5f6.cloud.databricks.com) - Workspace ID: The part after
/o=(e.g.,1234567890123456)
Note: The examples above show AWS endpoints (
.cloud.databricks.com). For Azure deployments, the workspace URL will behttps://<databricks-instance>.azuredatabricks.net.
Create a table using Databricks SQL:
CREATE TABLE <catalog_name>.default.<table_name> (
device_name STRING,
temp INT,
humidity BIGINT
)
USING DELTA;Replace <catalog_name> with your catalog name (e.g., main).
- Navigate to Settings > Identity and Access in your Databricks workspace
- Click Service principals and create a new service principal
- Generate a new secret for the service principal and save it securely
- Grant the following permissions:
USE_CATALOGon the catalog (e.g.,main)USE_SCHEMAon the schema (e.g.,default)MODIFYandSELECTon the table
Grant permissions using SQL:
-- Grant catalog permission
GRANT USE CATALOG ON CATALOG <catalog_name> TO `<service-principal-application-id>`;
-- Grant schema permission
GRANT USE SCHEMA ON SCHEMA <catalog_name>.default TO `<service-principal-application-id>`;
-- Grant table permissions
GRANT SELECT, MODIFY ON TABLE <catalog_name>.default.<table_name> TO `<service-principal-application-id>`;The service principal's Application ID is your OAuth Client ID, and the generated secret is your Client Secret.
All SDKs support two serialization formats:
- JSON - Simple, schema-free ingestion. Pass a JSON string or native object (dict, map, etc.) and the SDK serializes it. No compilation step required. Good for getting started or dynamic schemas.
- Protocol Buffers - Strongly-typed, schema-validated ingestion. More efficient over the wire. Recommended for production workloads.
Use proto2 syntax with optional fields to correctly represent nullable Delta table columns.
| Delta Type | Proto2 Type |
|---|---|
| TINYINT, BYTE, INT, SMALLINT, SHORT | int32 |
| BIGINT, LONG | int64 |
| FLOAT | float |
| DOUBLE | double |
| STRING, VARCHAR | string |
| BOOLEAN | bool |
| BINARY | bytes |
| DATE | int32 |
| TIMESTAMP, TIMESTAMP_NTZ | int64 |
| ARRAY<type> | repeated type |
| MAP<key, value> | map<key, value> |
| STRUCT<fields> | nested message |
| VARIANT | string (JSON string) |
Instead of writing .proto files by hand, each SDK ships a tool to generate protobuf schemas directly from an existing Unity Catalog table. See the individual SDK READMEs for language-specific usage.
All SDKs support HTTP CONNECT proxies via environment variables, following gRPC core conventions. The first variable found (in order) is used:
| Proxy | No-proxy |
|---|---|
grpc_proxy / GRPC_PROXY |
no_grpc_proxy / NO_GRPC_PROXY |
https_proxy / HTTPS_PROXY |
no_proxy / NO_PROXY |
http_proxy / HTTP_PROXY |
The no_proxy value is a comma-separated list of hostnames (suffix-matched) or * to bypass the proxy entirely.
export https_proxy=http://my-proxy:8080
export no_proxy=localhost,127.0.0.1The SDK establishes a plaintext HTTP CONNECT tunnel through the proxy, then performs a TLS handshake end-to-end with the Databricks server. The proxy never sees decrypted traffic.
See CONTRIBUTING.md. Each SDK also has its own contributing guide with language-specific setup instructions.
This project is licensed under the Databricks License. See LICENSE for the full text.