|
2 | 2 |
|
3 | 3 | [](https://github.com/QuantStack/Arbalister/actions/workflows/build.yml) |
4 | 4 |
|
5 | | -This viewer lets you double click on many file types supported in the Apache Arrow ecosystem |
6 | | -to automatically view it as tabular data (Csv, Parquet, Avro, Orc, Ipc). |
| 5 | +A JupyterLab extension for viewing tabular data files. |
| 6 | +Double-click to open Parquet, Avro, ORC, SQLite, and other Arrow-compatible formats directly in JupyterLab without writing code. |
7 | 7 |
|
8 | | -For library authors, the server extension serves files in the Arrow IPC stream format. |
9 | | -It can be reused to provide other type of application specific viewers (*e.g.* as time series, ...). |
| 8 | + |
10 | 9 |
|
11 | | -This extension is composed two packages both called `arbalister`: |
12 | | -- A Python server extension available on PyPI; |
13 | | -- A Typescript client extension available on NPM. |
| 10 | +## Features |
| 11 | + |
| 12 | +**Existing**: |
| 13 | +- ποΈ **Supported formats**: Parquet, CSV, Avro, ORC, SQLite, Arrow IPC |
| 14 | +- β‘ **Lazy loading**: Streams chunks of data on-demand, handles files larger than memory |
| 15 | +- β±οΈ **Prefetching**: Load next chunk for smooth scrolling |
| 16 | +- βοΈ **Reading options**: Interactive toolbar for CSV delimiters, SQLite table selection, etc. |
| 17 | +- π **Extensible**: Server extension provides Arrow IPC streams for building custom viewers |
| 18 | + |
| 19 | +**Planned (contributions welcome)**: |
| 20 | +- βοΈ **S3 and data lakes**: Support for Apache Iceberg, Delta Lake, and other cloud-native table formats over object storage |
| 21 | +- π **Database viewer**: Non-file (URL) database viewer |
| 22 | +- π» **WASM/JupyterLite support**: Run Arbalister in the browser without a Python backend |
| 23 | +- π **Alternative clients**: Custom non default viewers for time-series and geospacial data |
| 24 | +- π **Filters**: Search and filter data with ease |
| 25 | + |
| 26 | +## Architecture |
| 27 | + |
| 28 | +Data is divided into chunks across rows and columns. |
| 29 | +The client requests the chunks needed for the current viewport. |
| 30 | +The server reads the relevant portion using DataFusion and returns it as Arrow IPC stream format. |
| 31 | +Background pre-fetching ensures smooth scrolling. |
| 32 | + |
| 33 | + |
| 34 | + |
| 35 | +This extension is composed of two packages both called `arbalister`: |
| 36 | +- A Python server extension available on PyPI |
| 37 | +- A TypeScript client extension available on NPM |
14 | 38 |
|
15 | 39 | ## Requirements |
16 | 40 |
|
|
0 commit comments