Skip to content

takaebato/sql-insight

Repository files navigation

sql-insight

A utility for SQL query analysis, formatting, and transformation. Built on sqlparser-rs, it works across every SQL dialect sqlparser-rs supports.

Crates.io Docs.rs Rust codecov License: MIT

Features

  • Table-level Operation Extraction: identify which tables a statement reads, which it writes, and the lineage between sources and targets.
  • Column-level Operation Extraction: the same at column granularity — track lineage from individual source columns to target columns, distinguishing pure forwarding from value-changing expressions.
  • Optional Catalog: pass column schemas to tighten column resolution and pair INSERT values with target columns by position. Best-effort without one.
  • Table Extraction: flat list of tables a statement touches — the lightest extraction when you don't need a read/write split or lineage.
  • CRUD Table Extraction: tables bucketed by Create / Read / Update / Delete role, for CRUD-style access analysis.
  • SQL Formatting: emit a query in a consistent layout (single-line by default, multi-line pretty-print on demand).
  • SQL Normalization: collapse structurally identical queries to the same string (placeholder-substitute literals, optionally collapse repetitive shapes), useful for query fingerprinting and deduplication.

Install

[dependencies]
sql-insight = "0.2.0"

Usage

Table-level Operation Extraction

Get the statement kind plus three surfaces — reads (tables read), writes (tables written), and lineage (source → target edges, only for statements that physically move data) — in one call:

use sql_insight::sqlparser::dialect::GenericDialect;
use sql_insight::extractor::{extract_table_operations, StatementKind};

let dialect = GenericDialect {};
let result = extract_table_operations(
    &dialect,
    "INSERT INTO t1 (a) SELECT a FROM t2",
).unwrap();
let ops = result[0].as_ref().unwrap();
assert_eq!(ops.statement_kind, StatementKind::Insert);
assert_eq!(ops.reads.len(), 1);   // t2
assert_eq!(ops.writes.len(), 1);  // t1
assert_eq!(ops.lineage.len(), 1); // t2 → t1

Column-level Operation Extraction

Same surfaces, at column granularity. reads / writes are plain occurrence lists of column references; lineage edges carry a kind (Passthrough vs Transformation) describing how each source reaches its target:

use sql_insight::sqlparser::dialect::GenericDialect;
use sql_insight::extractor::extract_column_operations;

let dialect = GenericDialect {};
let result = extract_column_operations(
    &dialect,
    "INSERT INTO t1 (a, b) SELECT a, LOWER(b) FROM t2",
).unwrap();
let ops = result[0].as_ref().unwrap();
// a → a (Passthrough), b → b (Transformation, via LOWER).
assert_eq!(ops.lineage.len(), 2);

Table Extraction (lightweight)

Flat list of table references touched by a statement:

use sql_insight::sqlparser::dialect::GenericDialect;
use sql_insight::extractor::extract_tables;

let dialect = GenericDialect {};
let extractions = extract_tables(&dialect, "SELECT * FROM catalog.schema.t1").unwrap();
let extraction = extractions[0].as_ref().unwrap();
assert_eq!(extraction.tables.len(), 1);
assert_eq!(extraction.tables[0].to_string(), "catalog.schema.t1");

CRUD Table Extraction

Bucket tables by create / read / update / delete role:

use sql_insight::sqlparser::dialect::GenericDialect;
use sql_insight::extractor::extract_crud_tables;

let dialect = GenericDialect {};
let result = extract_crud_tables(&dialect, "INSERT INTO t1 (a) SELECT a FROM t2").unwrap();
let crud = result[0].as_ref().unwrap();
assert_eq!(crud.create_tables.len(), 1); // t1
assert_eq!(crud.read_tables.len(), 1);   // t2
assert!(crud.update_tables.is_empty());
assert!(crud.delete_tables.is_empty());

SQL Formatting

use sql_insight::sqlparser::dialect::GenericDialect;

let dialect = GenericDialect {};
let formatted = sql_insight::formatter::format(
    &dialect, "SELECT * \n from t1   WHERE a = 1"
).unwrap();
assert_eq!(formatted, ["SELECT * FROM t1 WHERE a = 1"]);

format_with_options + FormatterOptions::pretty switches to sqlparser's multi-line pretty-print.

SQL Normalization

Substitute literals with placeholders so structurally identical queries hash to the same shape:

use sql_insight::sqlparser::dialect::GenericDialect;

let dialect = GenericDialect {};
let normalized = sql_insight::normalizer::normalize(
    &dialect, "SELECT * \n from t1   WHERE a = 1"
).unwrap();
assert_eq!(normalized, ["SELECT * FROM t1 WHERE a = ?"]);

normalize_with_options adds three opt-in collapses: IN (1, 2, 3)IN (...), VALUES (1, 2, 3), (4, 5, 6)VALUES (...), and INSERT INTO t (c, b, a) VALUES (1, 2, 3)INSERT INTO t (a, b, c) VALUES (...).

Options: catalog & casing

Each extractor has an _with_options twin taking ExtractorOptions — a catalog and/or an identifier-casing override:

use sql_insight::sqlparser::dialect::GenericDialect;
use sql_insight::catalog::Catalog;
use sql_insight::extractor::{extract_column_operations_with_options, ExtractorOptions};

let catalog = Catalog::from_ddl(
    &GenericDialect {},
    "CREATE TABLE users (id INT, name TEXT)",
).unwrap();
let options = ExtractorOptions::new().with_catalog(&catalog);
let result = extract_column_operations_with_options(
    &GenericDialect {},
    "SELECT id FROM users",
    options,
).unwrap();

An optional Catalog makes resolution strict: a column the schema doesn't list surfaces as ResolutionKind::Unresolved, and a column-list-less INSERT pairs its positional values with the catalog's target columns. Build one with Catalog::from_ddl or the CatalogTable builder. Identifier casing is dialect-derived by default; with_casing overrides it (e.g. to model a deployment-specific collation). Every extractor also works catalog-free in best-effort mode.

Limitations

See the Limitations section of the crate docs.

JSON output

Enable the serde feature to derive Serialize on the result types, so they (and their references / diagnostics) can be emitted as JSON or any serde format:

sql-insight = { version = "0.2.0", features = ["serde"] }
let json = serde_json::to_string(&ops)?;

Examples

See sql-insight/examples/ for runnable samples covering table-level operations, column-level lineage, the catalog path, and casing overrides. Run with cargo run --example <name> -p sql-insight.

Supported SQL Dialects

sql-insight supports a comprehensive range of SQL dialects through sqlparser-rs:

  • Generic
  • MySQL
  • PostgreSQL
  • Hive
  • SQLite
  • Snowflake
  • Redshift
  • Microsoft SQL Server
  • ClickHouse
  • BigQuery
  • ANSI
  • DuckDB
  • Databricks
  • Oracle

See the sqlparser-rs documentation for dialect-specific details.

Contributing

Contributions to sql-insight are welcome! Whether it's adding new features, fixing bugs, or improving documentation, feel free to fork the repository and submit a pull request.

License

MIT — see LICENSE.txt.

About

A utility for SQL query analysis, formatting, and transformation, supporting various dialects

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages