feat(iceberg): PostgreSQL compatibility via backend capability model#659
Merged
Conversation
Generalize DuckLakeMode into a per-backend capability model and add an Iceberg-via-Lakekeeper preset so PostgreSQL clients get a stable, documented compatibility subset. Reconciled onto main's catalog-identity model (current_database() reports the real catalog; no logical-database masking). - transpiler capability model (StorageBackend + BackendCapabilities); DuckLake preset reproduces prior DuckLakeMode behavior exactly - Iceberg preset: full DDL/DML policy, plus public-schema rewrite disabled (Iceberg's physical schema is "public") and three-part refs left untouched - canonical Iceberg->PostgreSQL type map (nested -> jsonb) feeding REST metadata; hide __duckgres_iceberg_column_metadata from information_schema.tables - DDL/DML: ON CONFLICT ON CONSTRAINT and ALTER COLUMN TYPE ... USING rejected with 0A000 via transform.CodedError (conn.go honors the carried SQLSTATE) - error normalization: unwrap Flight/gRPC envelopes so worker errors classify to proper PostgreSQL SQLSTATEs instead of XX000; map "Not implemented" to 0A000 - docs/iceberg-pg-compat.md compatibility contract Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…plit Restructures the Iceberg PG-compatibility work: - transpiler capability model → typed backend profiles (transpiler/backend) - canonical Iceberg→PG type map → server/pgtypes (ForIceberg) - session identity split into client database vs physical catalog (server/sessioncatalog.Selection); current_database()/pg_catalog surfaces expose the client-visible database while execution routes to the physical DuckDB catalog Fix (regression caught by the integration suite): rewriteDirectQuery now maps `USE <client-database-name>` to the physical catalog's default schema (ducklake.main / iceberg.public). Without it the common round-trip of reading current_database() and issuing `USE` on it emitted a bogus `USE <client-db>` that DuckDB rejected, stranding the session on the wrong catalog. Adds unit coverage in direct_query_rewrite_test.go. Verified: unit/server/controlplane green; integration suite 933/0; live Lakekeeper Iceberg smoke (attach, identity, 0A000 rejections, introspection). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Recent main (#651) defaults sniRoutingMode to "enforce" when unset, so unresolvable managed hostnames are rejected on the configStore-backed multi-tenant path. The iceberg refactor dropped this default; restore it. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adhere to main (#651): the startup `database` param is pure catalog selection — only "" / "ducklake" / "iceberg" are accepted; arbitrary names fail closed (3D000). current_database()/pg_catalog surfaces de-mask to the real attached catalog rather than echoing the startup name. - configstore: reject non-catalog database names (CatalogValid=false) - control plane + standalone conn: de-mask the session identity to the resolved physical catalog before installing session metadata - drop the now-dead `USE <client-db>` rewrite case (current_database() is again the physical catalog, so the existing `USE ducklake`/`USE iceberg` cases cover the round-trip) and its unit test - restore tests/integration/catalog_demask_test.go to main's de-masking contract Verified: server + controlplane green; integration 933/0; live Lakekeeper smoke (current_database() de-masks to "iceberg"). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- restore accurate comments describing the startup `database` param as pure catalog selection (not a client-visible database name), matching the fail-closed behavior - drop the vestigial PostgresConnectionResolution.ClientDatabase field (it was captured then immediately de-masked) - inline the probeAttachedCatalogs helper back to main's two direct probes - revert the session-default helpers to plain string params (no Selection), dropping the sessioncatalog import from session_search_path No behavior change; the shared sessioncatalog.ResolveSelection (used by the standalone server path too) is retained. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…eberg compat The session-catalog abstraction (sessioncatalog.Selection / ResolveSelection, the ClientDatabase-vs-PhysicalCatalog split) was built for a client-database identity model that was reverted to match main. With that gone, the abstraction was dead weight, so remove it and restore main's inline catalog resolution: - delete server/sessioncatalog package - control.go / conn.go: restore main's inline resolveEffectiveCatalog + de-mask-to-real-catalog; the only addition is SetConnectionPhysicalCatalog, which feeds the transpiler's backend-profile selection (the actual iceberg hook) - sessionmeta: restore the catalog-string signature and main's metadata views; keep the genuine iceberg type-map additions (udt_name column + __duckgres_iceberg_column_metadata filter) - configstore/session_search_path/tests/k8s: restored to main controlplane diff vs main is now ~3 lines. Verified: build (both tags) + server/controlplane/transpiler suites green; integration 933/0. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Exhaustive Iceberg DML/DDL QA surfaced two statements that skipped the DDL
transform because Classify's substring list didn't mention them — when they were
the only DDL trigger word in the statement, FlagDDL was never set and the
statement reached DuckDB unhandled:
- ANALYZE <table>: parses as a VacuumStmt but lacked an "ANALYZE" trigger, so it
was not no-op'd and DuckDB rejected it ("Vacuum is only implemented for DuckDB
tables"). Add "ANALYZE" to Classify and give it its own command tag
(distinguished from VACUUM via VacuumStmt.IsVacuumcmd).
- CREATE TABLE ... col GENERATED ALWAYS AS (...) STORED: with no other trigger
word, the generated column was not stripped and DuckDB rejected the STORED
generated column. Add "GENERATED" to Classify.
Unit tests added for both (ANALYZE no-op tag; GENERATED stripped when sole
trigger). Verified: transpiler/server/controlplane suites green; integration
933/0; live Lakekeeper exhaustive DDL+DML QA all pass.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Makes Duckgres present a stable, documented PostgreSQL-compatible surface when the backing catalog is Iceberg via Lakekeeper (executed by DuckDB). Generalizes the prior
DuckLakeModemachinery into a reusable typed backend-profile model and adds an Iceberg profile. Unsupported PostgreSQL semantics return predictable PG-shaped errors / safe command tags rather than raw DuckDB/Flight failures. Not full PG emulation.Catalog identity follows main unchanged: the startup
databaseparam is pure catalog selection (""/ducklake/iceberg; anything else →3D000), andcurrent_database()/pg_catalog/information_schemareport the real attached catalog. There is no logical-database masking. The PR keeps the control-plane diff to ~3 lines (one hook); it does not change routing/identity.Core model
transpiler/backend): aProfilebundles catalog/DDL/DML/metadata policies, selected per session from the resolved physical catalog. Presets for memory, ducklake, iceberg; the DuckLake preset reproduces prior behavior exactly. Iceberg mirrors DuckLake's DDL/DML policy but keeps the physical schemapublic.server.SetConnectionPhysicalCatalogrecords the resolved catalog on the connection sonewTranspilerpicks the right profile. This is the only control-plane addition.Changes
server/pgtypes.ForIceberg): canonical Iceberg→PG type map (nested →jsonb) applied at REST-metadata load time;udt_namepopulated; internal__duckgres_iceberg_column_metadatahidden frominformation_schema.tables.DEFAULT now()) andGENERATEDcolumns, no-op DDL (CREATE/DROP INDEX, VACUUM, ANALYZE, REINDEX, CLUSTER, GRANT/REVOKE, COMMENT, REFRESH MATVIEW),DROP … CASCADE→RESTRICT, split multi-commandALTER TABLE.INSERT … ON CONFLICT (cols) DO UPDATE/NOTHING→MERGE(requires an explicit column list).ON CONFLICT ON CONSTRAINTandALTER COLUMN TYPE … USINGrejected with0A000via a SQLSTATE-carryingtransform.CodedError.public→mainrewrite disabled for Iceberg (its physical schema is literallypublic).ANALYZEandGENERATEDare now detected as DDL triggers — previously a statement whose only trigger word was one of these skipped the DDL transform and reached DuckDB unhandled (ANALYZEerrored; STOREDGENERATEDcolumns errored).XX000; mapNot implemented→0A000.Compatibility summary
information_schema,pg_catalog,current_database(), JDBC).0A000: ON CONFLICT ON CONSTRAINT, ALTER COLUMN TYPE … USING, DML RETURNING via extended-query Describe; FOR UPDATE/SHARE stripped.Testing
go build ./...+-tags kubernetes,go vet ./...— clean0A000rejections, INSERT/UPDATE/DELETE,ON CONFLICT→MERGE, MERGE — all pass. This QA found and fixed the ANALYZE/GENERATED Classify bugs above.Known gaps / out of scope
… is already outdated. Please restart your transaction, and ALTER commits can hit a RESTConflict_409. Clients must retry (a40001-style mapping/retry is a follow-up).ON CONFLICT→MERGE requires an explicit column list;INSERT INTO t VALUES (…) ON CONFLICT …(no column list) is not converted.RETURNINGon Iceberg INSERT is rejected by DuckDB itself (RETURNING clause not yet supported for insertion into Iceberg table).text(information_schemacorrectly reportsjsonb).setIcebergDefault(multitenant worker path does); full data-path validation in CI is thetests/k8slane against real cloud storage.LOCK TABLE/ advisory locks /SET TRANSACTIONnormalization; metadata perf, full client-compat matrix, observability metrics, and staged rollout are not implemented.🤖 Generated with Claude Code