Skip to content

Commit dd05da0

Browse files
authored
doc: Improve readme introduction (#58)
1 parent 86810d0 commit dd05da0

3 files changed

Lines changed: 1944 additions & 7 deletions

File tree

β€ŽREADME.mdβ€Ž

Lines changed: 31 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -2,15 +2,39 @@
22

33
[![Github Actions Status](https://github.com/QuantStack/Arbalister/workflows/Build/badge.svg)](https://github.com/QuantStack/Arbalister/actions/workflows/build.yml)
44

5-
This viewer lets you double click on many file types supported in the Apache Arrow ecosystem
6-
to automatically view it as tabular data (Csv, Parquet, Avro, Orc, Ipc).
5+
A JupyterLab extension for viewing tabular data files.
6+
Double-click to open Parquet, Avro, ORC, SQLite, and other Arrow-compatible formats directly in JupyterLab without writing code.
77

8-
For library authors, the server extension serves files in the Arrow IPC stream format.
9-
It can be reused to provide other type of application specific viewers (*e.g.* as time series, ...).
8+
![A Parquet file opened with Arbalister](assets/arbalister.png)
109

11-
This extension is composed two packages both called `arbalister`:
12-
- A Python server extension available on PyPI;
13-
- A Typescript client extension available on NPM.
10+
## Features
11+
12+
**Existing**:
13+
- πŸ—‚οΈ **Supported formats**: Parquet, CSV, Avro, ORC, SQLite, Arrow IPC
14+
- ⚑ **Lazy loading**: Streams chunks of data on-demand, handles files larger than memory
15+
- ⏱️ **Prefetching**: Load next chunk for smooth scrolling
16+
- βš™οΈ **Reading options**: Interactive toolbar for CSV delimiters, SQLite table selection, etc.
17+
- πŸ”Œ **Extensible**: Server extension provides Arrow IPC streams for building custom viewers
18+
19+
**Planned (contributions welcome)**:
20+
- ☁️ **S3 and data lakes**: Support for Apache Iceberg, Delta Lake, and other cloud-native table formats over object storage
21+
- 🌐 **Database viewer**: Non-file (URL) database viewer
22+
- πŸ’» **WASM/JupyterLite support**: Run Arbalister in the browser without a Python backend
23+
- πŸ“ˆ **Alternative clients**: Custom non default viewers for time-series and geospacial data
24+
- πŸ”Ž **Filters**: Search and filter data with ease
25+
26+
## Architecture
27+
28+
Data is divided into chunks across rows and columns.
29+
The client requests the chunks needed for the current viewport.
30+
The server reads the relevant portion using DataFusion and returns it as Arrow IPC stream format.
31+
Background pre-fetching ensures smooth scrolling.
32+
33+
![Arbalister client-server architecture](assets/architecture.svg)
34+
35+
This extension is composed of two packages both called `arbalister`:
36+
- A Python server extension available on PyPI
37+
- A TypeScript client extension available on NPM
1438

1539
## Requirements
1640

β€Žassets/arbalister.pngβ€Ž

427 KB
Loading

0 commit comments

Comments
Β (0)