test(e2e-mw-dev): port the kind k8s suite into the real-cluster harness; retire tests/k8s#674
Merged
Merged
Conversation
d3d4a38 to
d9b9c46
Compare
…ss; retire tests/k8s The kind-based tests/k8s/ suite tested the multi-tenant activation pipeline against a fake cluster — it could not exercise the layers where this quarter's bugs actually lived (real Cilium, Crossplane ducklings, cnpg-shard + external RDS metadata, per-org Lakekeeper). The per-PR mw-dev harness can. Port every cluster test into harness.sh, implement the remaining TODOs, and remove the kind suite + its CI job. harness.sh (now runs kubectl in-cluster via the Job's SA token): - wire/query: SELECT 1 + 5 concurrent distinct connections - activation: DuckLake + Iceberg attach + R/W on cnpg AND ext backends - extension forks: bundled ducklake/httpfs are the PostHog forks, not upstream - worker pods: labels, securityContext (non-root/uid1000/no-priv-esc), Downward-API POD_NAME/NODE_NAME env, no ambient SA-token mount - resilience: worker-pod kill crash recovery; DuckLake durability across a worker restart; concurrent writers (fork conflict-retry) - isolation: cnpg vs ext see distinct catalogs, cross-tenant read denied - lifecycle: deprovision -> wait Duckling CR --for=delete -> re-provision the SAME org id -> R/W again (the stranded-cnpg-role regression net, now reliable because it waits on the CR's finalizer cascade instead of warehouse=deleted) run.sh: add a `janitor` subcommand; e2e-mw-dev.yml gains a 6h `schedule` trigger that runs only the janitor (reaps stale duckgres-ci-pr-* namespaces + their ducklings/cnpg-role/PIA/bindings). NAMESPACE no longer required for janitor. tests/k8s removal: - delete the kind Go suite + its testdata + CLAUDE.md (cluster tests ported; test-only harness helpers retired with it) - the RBAC + network-policy static-manifest asserts (the only unit tests over real shipped k8s/ config) move to tests/manifests/ and run in `go test ./...` - remove the k8s-integration-tests job from ci.yml The supporting k8s/ scripts/manifests + Dockerfiles are kept for now (a later cleanup PR removes the now-dangling `just test-k8s-integration` recipe). Deliberately not ported: warm-pool activation + version-reaper (per-PR CP runs warm-target=0), physical S3-prefix isolation (no list creds from the Job), Cilium egress probing (needs a stable in-worker exec) — documented in README.
d9b9c46 to
493ebac
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
The kind-based
tests/k8s/suite tested the multi-tenant activation pipeline against a fake cluster — it could not exercise the layers where this quarter's bugs actually lived (real Cilium, Crossplane ducklings, cnpg-shard + external-RDS metadata, per-org Lakekeeper). The per-PR mw-dev harness (tests/e2e-mw-dev/, merged in #657) can. This PR ports every cluster test into the shell harness, implements the remaining harness TODOs, and removes the kind suite + its CI job.harness.sh — ported coverage (runs
kubectlin-cluster via the Job's SA token)SELECT 1+ 5 concurrent distinct connectionsTestK8sBasicQuery,TestK8sMultipleConcurrentConnectionsducklake/httpfsare the PostHog forks, not upstream*IsBundledForkPOD_NAME/NODE_NAME, no SA-token mountTestK8sWorkerPodCreation,*SecurityContext,*StampedWithPodAndNode,*DoNotMountServiceAccountToken*WorkerCrashRecovery,*DurabilityAcrossWorkerRestart,*ConcurrentWriters*DifferentTenantsSeeDistinctCatalogs--for=delete→ re-provision the same org id → R/W againThe lifecycle/recreate check is reliable now because it waits on the Crossplane Duckling CR's finalizer cascade (which drops the cnpg role+db) instead of racing on
warehouse=deleted— the bug the old harness comment flagged as un-portable.TODOs resolved
--for=deletewait.drop_cnpg_rolestays as an idempotent backstop.run.sh e2e-cleanup+ a 6hscheduletrigger ine2e-mw-dev.ymlreap staleduckgres-ci-pr-*namespaces (+ ducklings, cnpg role+db, Pod Identity association, cross-ns bindings). Renamed away from "janitor" to avoid colliding with duckgres's own control-plane janitor.tests/k8s removal
testdata+CLAUDE.md. The cluster tests are ported; the remaining unit tests covered test-only kind-harness helpers (port-forward state machine, transient-DB/pod-gone detection, setup loader) that retire with the harness.k8s/config) move totests/manifests/and run in the normalgo test ./...lane.k8s-integration-testsjob fromci.yml(and its iceberg OIDC/STS wiring — that path is now in the e2e harness).Kept for now
The supporting
k8s/scripts/manifests +Dockerfile*stay (per request). Thejust test-k8s-integrationrecipe is now a dangling reference (its./tests/k8s/...target is gone) — a later cleanup PR removes it.Deliberately not ported (documented in README)
DUCKGRES_K8S_SHARED_WARM_TARGET=0(no idle warm workers); stay covered bycontrolplane/unit tests.tests/manifests/.Validation
go test ./tests/manifests/✅,gofmtclean,sh -n/bash -non the scripts ✅ locally.e2e-mw-devrun against real mw-dev.