Skip to content

databricks/zerobus-sdk

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

61 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Zerobus SDKs

Monorepo for Databricks Zerobus Ingest SDKs.

Disclaimer

GA: This SDK is generally available and supported for production use cases. Minor and patch version updates will not contain breaking changes. Major version updates may include breaking changes.

We are keen to hear feedback from you. Please file issues, and we will address them.

What is Zerobus?

Zerobus is a high-throughput streaming service for direct data ingestion into Databricks Delta tables, optimized for real-time data pipelines and high-volume workloads.

SDKs

Language Directory Package
Rust rust/ databricks-zerobus-ingest-sdk
Python python/ databricks-zerobus-ingest-sdk
Go go/ github.com/databricks/zerobus-sdk/go
TypeScript typescript/ @databricks/zerobus-ingest-sdk
Java java/ com.databricks:zerobus-ingest-sdk

Platform Support

We try to provide prebuilt native binaries for the following platforms:

Platform Architecture
Linux x86_64
Linux aarch64
Windows x86_64
macOS x86_64
macOS aarch64 (Apple Silicon)

Note: We do not currently have macOS CI runners, so macOS binaries are built locally and may not be available for every SDK or release. If your platform is not supported or you encounter compatibility issues, you can build from source or file an issue.

Prerequisites

Before using any SDK, you need the following:

1. Workspace URL and Workspace ID

After logging into your Databricks workspace, look at the browser URL:

https://<databricks-instance>.cloud.databricks.com/o=<workspace-id>
  • Workspace URL: The part before /o= (e.g., https://dbc-a1b2c3d4-e5f6.cloud.databricks.com)
  • Workspace ID: The part after /o= (e.g., 1234567890123456)

Note: The examples above show AWS endpoints (.cloud.databricks.com). For Azure deployments, the workspace URL will be https://<databricks-instance>.azuredatabricks.net.

2. Create a Delta Table

Create a table using Databricks SQL:

CREATE TABLE <catalog_name>.default.<table_name> (
    device_name STRING,
    temp INT,
    humidity BIGINT
)
USING DELTA;

Replace <catalog_name> with your catalog name (e.g., main).

3. Create a Service Principal

  1. Navigate to Settings > Identity and Access in your Databricks workspace
  2. Click Service principals and create a new service principal
  3. Generate a new secret for the service principal and save it securely
  4. Grant the following permissions:
    • USE_CATALOG on the catalog (e.g., main)
    • USE_SCHEMA on the schema (e.g., default)
    • MODIFY and SELECT on the table

Grant permissions using SQL:

-- Grant catalog permission
GRANT USE CATALOG ON CATALOG <catalog_name> TO `<service-principal-application-id>`;

-- Grant schema permission
GRANT USE SCHEMA ON SCHEMA <catalog_name>.default TO `<service-principal-application-id>`;

-- Grant table permissions
GRANT SELECT, MODIFY ON TABLE <catalog_name>.default.<table_name> TO `<service-principal-application-id>`;

The service principal's Application ID is your OAuth Client ID, and the generated secret is your Client Secret.

Serialization Formats

All SDKs support two serialization formats:

  • JSON - Simple, schema-free ingestion. Pass a JSON string or native object (dict, map, etc.) and the SDK serializes it. No compilation step required. Good for getting started or dynamic schemas.
  • Protocol Buffers - Strongly-typed, schema-validated ingestion. More efficient over the wire. Recommended for production workloads.

Protocol Buffers

Use proto2 syntax with optional fields to correctly represent nullable Delta table columns.

Delta → Protobuf Type Mappings

Delta Type Proto2 Type
TINYINT, BYTE, INT, SMALLINT, SHORT int32
BIGINT, LONG int64
FLOAT float
DOUBLE double
STRING, VARCHAR string
BOOLEAN bool
BINARY bytes
DATE int32
TIMESTAMP, TIMESTAMP_NTZ int64
ARRAY<type> repeated type
MAP<key, value> map<key, value>
STRUCT<fields> nested message
VARIANT string (JSON string)

Schema Generation

Instead of writing .proto files by hand, each SDK ships a tool to generate protobuf schemas directly from an existing Unity Catalog table. See the individual SDK READMEs for language-specific usage.

HTTP Proxy Support

All SDKs support HTTP CONNECT proxies via environment variables, following gRPC core conventions. The first variable found (in order) is used:

Proxy No-proxy
grpc_proxy / GRPC_PROXY no_grpc_proxy / NO_GRPC_PROXY
https_proxy / HTTPS_PROXY no_proxy / NO_PROXY
http_proxy / HTTP_PROXY

The no_proxy value is a comma-separated list of hostnames (suffix-matched) or * to bypass the proxy entirely.

export https_proxy=http://my-proxy:8080
export no_proxy=localhost,127.0.0.1

The SDK establishes a plaintext HTTP CONNECT tunnel through the proxy, then performs a TLS handshake end-to-end with the Databricks server. The proxy never sees decrypted traffic.

Contributing

See CONTRIBUTING.md. Each SDK also has its own contributing guide with language-specific setup instructions.

License

This project is licensed under the Databricks License. See LICENSE for the full text.