fix(secrets): worker persistent-secret hygiene + DROP passthrough#658
Merged
Conversation
A user hit DuckDB "secret occurs in multiple storage backends" errors that persisted for weeks. Root cause: a CREATE PERSISTENT SECRET written days/weeks ago landed in DuckDB's default secret store ($HOME/.duckdb/stored_secrets), which survives across worker restarts on any non-ephemeral disk. Each activation re-creates the same-named in-memory secret, so the name then exists in both local_file and memory → every reference and every DROP is ambiguous. Worse, the fix DuckDB itself suggests (DROP <PERSISTENT|TEMPORARY> SECRET ...) was unrunnable: those forms don't start with "DROP SECRET", so they fell through to the PG transpiler and died with `syntax error at or near "PERSISTENT"`. The user could not clean up the state through duckgres at all. Three changes: 1. server/conn.go: add DROP PERSISTENT/TEMPORARY SECRET (and IF EXISTS forms) to the DuckDB utility-command passthrough so they reach DuckDB untouched. 2. server/server.go: pin secret_directory to <DataDir>/secrets in ConfigureMainDB instead of DuckDB's $HOME default. Deterministic, per-worker, and wipeable. As a side benefit, redirecting the directory means stale secrets in the old default path stop being loaded, so they can no longer collide with the in-memory secrets re-created at activation. 3. duckdbservice/service.go: wipe the secret directory at worker startup in Warmup (duckdb-service mode only). A shared-warm worker is single-org-bound for its whole life, so process start == recycle; this guarantees a CREATE PERSISTENT SECRET from a prior incarnation cannot resurface or leak across tenants. Standalone mode is untouched. Adds TestIsDuckDBUtilityCommand covering the DROP variants, quoting, comments, case-insensitivity, and word-boundary guards. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Extract the Warmup wipe into wipePersistedSecrets and add a unit-level test that exercises the full recycle contract against real DuckDB: 1. CREATE PERSISTENT SECRET -> lands on disk under the pinned secret_directory 2. plain reopen -> secret reloads (the bug; also proves the test is non-vacuous) 3. wipePersistedSecrets (the exact Warmup call) -> directory gone 4. reopen after recycle -> secret is gone Done here rather than tests/k8s on purpose: worker DataDir is an EmptyDir, so a pod-kill test would pass even with the wipe removed (pod replacement already yields a fresh empty /data). The boundary the wipe actually guards is a process restart over a surviving DataDir — container restart within a pod, or a persistent volume / warm node — which this test models directly. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Follow-up to the same incident: the DROP-based cleanup workaround (DROP SECRET ... FROM local_file) is unreliable — once a persistent secret's file has been removed out of band but the secret is still in DuckDB's in-memory registry (loaded at startup), DuckDB throws IO Error: Failed to remove secret file '.../<name>.duckdb_secret', the file may have been removed by another duckdb instance. which is exactly the state the user is in. Relocating secret_directory alone also doesn't stop a user from re-creating the landmine. Structural fix, scoped to workers: - server: add Config.DisablePersistentSecrets; ConfigureMainDB applies `SET allow_persistent_secrets = false`. Verified against real DuckDB that this both (a) refuses CREATE PERSISTENT SECRET and (b) does not load pre-existing on-disk secrets — so the ambiguity class is impossible. - duckdbservice: OpenDuckDBPair (the worker-only DB factory; standalone uses server.openBaseDB) sets the flag for every worker. Standalone is left able to use persistent secrets. - duckdbservice: wipePersistedSecrets now also clears the legacy <DataDir>/.duckdb/stored_secrets path (where prod files actually live, since the worker's HOME is its DataDir), not just the pinned secret_directory. Tests: server/persistent_secrets_test.go (disable refuses create + skips loading + still allows in-memory secrets; SecretDirectory derivation); extends TestWipePersistedSecretsOnRecycle to assert the legacy path is wiped. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…errcheck - ConfigureMainDB now pins secret_directory only when DisablePersistentSecrets is set (i.e. workers). Standalone keeps DuckDB's $HOME default, so upgrading no longer silently relocates an existing standalone user's persistent secrets. - secret_recycle_test pins secret_directory directly (it needs persistent secrets enabled to create one to wipe, which the worker config disables). - persistent_secrets_test: check db.Close() in defer to satisfy errcheck. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…y, more tests
Follow-ups from review (none change behavior):
- conn.go: clarify in the comment that the DROP SECRET / DROP PERSISTENT SECRET
prefixes are order-independent ("DROP SECRET" is not a string-prefix of
"DROP PERSISTENT SECRET"), pre-empting a misread that the ordering is load-bearing.
- server.go: extract LegacySecretDirectory(cfg) colocated with SecretDirectory
so the recycle wipe's two targets can't drift; wipePersistedSecrets and the
recycle test now use it. Covered in TestSecretDirectory.
- persistent_secrets_test: also assert CREATE OR REPLACE PERSISTENT SECRET is
rejected when persistent secrets are disabled (the variant most callers write).
Left as-is by design: SET secret_directory interpolates DataDir unescaped,
consistent with the sibling extension_directory/temp_directory/cache SETs — it's
trusted server config, and a SQL-escaping helper would be a separate
codebase-wide hardening change, not part of this incident fix.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e ATTACH CI caught a regression: disabling persistent secrets unregisters DuckDB's local_file secret storage backend, and the DuckLake ATTACH path depends on it: Failed to attach DuckLake: Invalid Input Error: ... Unknown secret storage found: 'local_file' (controlplane-tests TestDuckLakeBooleanPredicatesAndMergeInControlPlane and k8s-integration-tests both failed.) Reproduced in isolation: with allow_persistent_secrets=false, any reference to local_file storage errors. Revert the disable; keep the safe, sufficient pieces: - secret_directory is pinned to <DataDir>/secrets on workers (rename the gating field DisablePersistentSecrets -> PinSecretDirectory to match). Pinning leaves local_file storage registered (just relocated), so DuckLake ATTACH is unaffected, while stale secrets in the old $HOME default stop being loaded. - the recycle wipe (Warmup) still clears both the pinned and legacy dirs. - the DROP PERSISTENT/TEMPORARY SECRET passthrough fix is unchanged. Tests: replace TestDisablePersistentSecrets with TestPinSecretDirectory (a persistent secret still works and lands under the pinned dir) and TestPinSecretDirectoryIgnoresLegacyDefault (a secret in the legacy location is not loaded once pinned away). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
A user hit DuckDB errors that persisted for weeks:
Root cause. They ran
CREATE PERSISTENT SECRET ...once, days/weeks ago. A bareCREATE OR REPLACE SECRETdefaults to a temporary (memory) secret, andOR REPLACEonly replaces within the same storage backend — so the persistent copy was never touched. It was written under the worker's DataDir (/data/.duckdb/stored_secrets, since the worker'sHOMEis/data), which survives across worker incarnations on a persistent/long-lived/data. Every activation re-creates the same-named in-memory secret, so the name ends up in bothlocal_fileandmemory→ every reference and everyDROPis ambiguous.Two compounding bugs made cleanup impossible:
DROP <PERSISTENT|TEMPORARY> SECRET ...— was unrunnable: those forms don't start with"DROP SECRET", so they fell through to the PG transpiler and died withsyntax error at or near "PERSISTENT".DROP SECRET ... FROM local_fileis unreliable too: once the file is gone but the secret is still in DuckDB's startup-loaded registry, DuckDB throwsIO Error: Failed to remove secret file ...(reproduced against real DuckDB).Changes
Passthrough fix (
server/conn.go) — addDROP PERSISTENT SECRET/DROP TEMPORARY SECRET(+IF EXISTS) to the DuckDB utility-command allowlist so they reach DuckDB instead of the PG parser.Pin the secret directory on workers (
server/server.go+duckdbservice/duckdb_pair.go) — newConfig.PinSecretDirectory, applied inConfigureMainDBasSET secret_directory = '<DataDir>/secrets', set for every worker viaOpenDuckDBPair(the worker-only DB factory; standalone usesopenBaseDBand is untouched). Redirecting away from DuckDB's$HOMEdefault means stale secrets in the old location are no longer loaded, so they can't collide with the in-memory secret re-created at activation — and the location is now deterministic and wipeable.Wipe persisted secrets on worker recycle (
duckdbservice/service.go) —Warmupwipes both<DataDir>/secrets(pinned) and the legacy<DataDir>/.duckdb/stored_secretson worker startup (== recycle; a shared-warm worker is single-org-bound for life). Physically clears the historical files that caused the incident so they don't linger on a persistent DataDir.Why not
allow_persistent_secrets = falseAn earlier revision disabled persistent secrets outright. CI caught that it breaks the DuckLake
ATTACHpath (controlplane-tests+k8s-integration-tests): disabling unregisters DuckDB'slocal_filesecret storage backend, which DuckLake depends on (Unknown secret storage found: 'local_file'). Reverted. Pinning leaveslocal_fileregistered (just relocated), so DuckLake is unaffected, while the redirect + recycle wipe still keep workers clean.Tests
server/conn_test.go::TestIsDuckDBUtilityCommand— DROP variants, quoting, comments, case, word-boundary guards.server/persistent_secrets_test.go—TestPinSecretDirectory(a persistent secret still works and lands under the pinned dir);TestPinSecretDirectoryIgnoresLegacyDefault(a secret in the legacy location is not loaded once pinned away);TestSecretDirectory/LegacySecretDirectoryderivation.duckdbservice/secret_recycle_test.go::TestWipePersistedSecretsOnRecycle— full recycle contract against real DuckDB, non-vacuous control phase (secret survives a plain reopen), asserts both pinned and legacy dirs are wiped.go test ./server/+./duckdbservice/,go vet, andjust lint(CI-pinned golangci-lint) clean on touched files.Why no tests/k8s unit-test for the wipe
Worker
DataDir(/data) is anEmptyDir(controlplane/k8s_pool.go:827), destroyed on pod deletion — a pod-kill test would pass even with the wipe removed. The boundary the wipe guards is a process restart over a surviving DataDir (container restart, persistent volume, warm node), modeled directly by the unit test. (Note: the real DuckLake activation path is still covered by the existingcontrolplane/k8sintegration suites, which is what caught theallow_persistent_secretsregression.)Behavior changes to note
PinSecretDirectory, which only workers set. Standalone keeps DuckDB's$HOMEdefault and full persistent-secret behavior, so upgrading doesn't relocate existing secrets.<DataDir>/secretsand are wiped on recycle.CREATE/OR REPLACE` (in-memory) secrets — what duckgres and the user's scripts use — are unaffected.Immediate user remediation (pre-merge)
The DROP loop won't recover from the partially-deleted state. Clear the files and recycle the worker:
If
/datais a persistent volume, deleting the pod alone won't help. After this ships, a worker recycle does this automatically. The user can also drop theDROP ... FROM local_fileloop from their setup script.🤖 Generated with Claude Code