diff --git a/docs/operations/cutover.md b/docs/operations/cutover.md index e4ba670..6e8c9e3 100644 --- a/docs/operations/cutover.md +++ b/docs/operations/cutover.md @@ -122,10 +122,11 @@ fresh private-storage bucket. Every counter should be zero in the orphan + inconsistent categories. If anything is flagged, **stop** and investigate before T-0. -9. Deploy the rewrite to production with `helm upgrade --install` per - [deploy.md](deploy.md). The pod will boot against the just-imported - data + bucket but receive no public traffic yet (ingress hostname not - pointed yet). +9. Deploy the rewrite to production via the production GitOps repo (a + sibling to [`cfp-sandbox-cluster`](https://github.com/CodeForPhilly/cfp-sandbox-cluster) + — see [deploy.md](deploy.md)). The pod will boot against the + just-imported data + bucket but receive no public traffic yet (Gateway + hostname not pointed at the prod LoadBalancer yet). 10. Smoke-test the production hostname through `/etc/hosts` or via direct cluster IP: hit `/api/health`, `/api/people/`, diff --git a/docs/operations/deploy.md b/docs/operations/deploy.md index afb19b3..6a1ef86 100644 --- a/docs/operations/deploy.md +++ b/docs/operations/deploy.md @@ -1,26 +1,26 @@ -# Deploying codeforphilly-rewrite +# Deploying codeforphilly-ng -This guide covers the artifacts in [`deploy/`](../../deploy/), the boot sequence -inside the container, and the operational expectations of the staging and -production environments. The authoritative architectural contract is -[specs/architecture.md](../../specs/architecture.md#deploy); this document is -the runbook that implements it. +This guide covers the deploy surface and the boot sequence inside the +container. The authoritative architectural contract is +[specs/architecture.md](../../specs/architecture.md#deploy); this document +is the runbook that implements it. -> See also: [secrets.md](secrets.md) for the secret contract, [runbook.md](runbook.md) -> for incident response. +> See also: [sandbox-deploy.md](sandbox-deploy.md) for the manual sandbox +> bring-up procedure, [secrets.md](secrets.md) for the secret contract, +> [runbook.md](runbook.md) for incident response. ## TL;DR — anatomy ``` +----------------------+ -| GitHub Actions CI | deploy-staging.yml / deploy-production.yml +| GitHub Actions CI | ci.yml (build + test on PR / main) +----------+-----------+ - | docker build / push + | docker build / push (manual today) v +----------------------+ -| GHCR image | ghcr.io/codeforphilly/codeforphilly-ng: +| GHCR image | ghcr.io/codeforphilly/codeforphilly-ng: +----------+-----------+ - | helm upgrade --install + | kubectl apply -k (via GitOps below) v +----------------------+ | k8s Deployment | 1 replica, Recreate strategy, PVC + Secrets + ConfigMap @@ -28,8 +28,8 @@ the runbook that implements it. +----------+-----------+ | /api/* v /* (fallthrough) -+---------------+ +-----------------------+ -| Fastify routes | | apps/web/dist (SPA) | ++----------------+ +-----------------------+ +| Fastify routes | | apps/web/dist (SPA) | +----------------+ +-----------------------+ ``` @@ -37,23 +37,56 @@ The image holds **both** the API and the built SPA. There is no separate web container. The single replica is a hard architectural constraint ([specs/architecture.md](../../specs/architecture.md#process-model)). +## Manifests + +Kustomize base + per-environment overlays at +[`deploy/kustomize/`](../../deploy/kustomize/). The base lives in +`deploy/kustomize/base/`; environment overlays under +`deploy/kustomize/overlays//`. + +The base ships everything the cluster needs in any environment: +`Deployment`, `Service`, `ConfigMap`, `PersistentVolumeClaim`s, `Gateway` + +`HTTPRoute` (per-env hostname patched in the overlay), `ServiceAccount`. +Sealed `Secret`s live only in overlays (sealed against the target cluster's +sealed-secrets controller). + +Cluster-level deploys are driven by the +[`cfp-sandbox-cluster`](https://github.com/CodeForPhilly/cfp-sandbox-cluster) +GitOps repo, which pulls the workload from this repo's main branch, composes +its own per-cluster Gateway/HTTPRoute (under `_gateways/codeforphilly-ng.yaml`) +and SealedSecrets (under `codeforphilly-ng.secrets/`), and applies on merge. +Production stand-up will follow the same pattern under a `cfp-prod-cluster` +repo. + +For a one-shot manual apply (useful pre-GitOps or for an offline cluster): + +```bash +kubectl apply -k deploy/kustomize/overlays/sandbox +``` + ## Image ### Build ```bash -docker build -t ghcr.io/codeforphilly/codeforphilly-ng:dev . +docker build --platform=linux/amd64 \ + -t ghcr.io/codeforphilly/codeforphilly-ng:dev . ``` -Three stages — `deps` (full install), `build` (compile both workspaces, prune -dev deps), `runtime` (alpine + git + ca-certificates + tini). Final image runs -as `node` (uid 1000) per the `securityContext` in the Helm chart. +Three stages — `deps` (full install), `build` (compile shared, api, web — in +that order, since web/api consume shared's compiled output), `runtime` +(alpine + git + ca-certificates + tini). Final image runs as `node` +(uid 1000) per the `securityContext` in `deploy/kustomize/base/deployment.yaml`. + +`--platform=linux/amd64` is required on Apple Silicon hosts — the cluster +nodes are amd64 and won't pull an arm64-only manifest. ### Run (local smoke test) ```bash docker run --rm -p 3001:3001 \ - -e CFP_DATA_REMOTE=https://github.com/CodeForPhilly/codeforphilly-data-snapshot.git \ + -e CFP_DATA_REMOTE=https://github.com/CodeForPhilly/codeforphilly-data.git \ + -e CFP_DATA_BRANCH=fixture \ -e STORAGE_BACKEND=filesystem \ -e CFP_PRIVATE_STORAGE_PATH=/app/private-storage \ -e CFP_JWT_SIGNING_KEY="$(openssl rand -base64 48)" \ @@ -68,127 +101,61 @@ curl http://localhost:3001/ # SPA index.html ## Boot sequence -The container entrypoint (`deploy/docker/entrypoint.sh`) does, in order: - -1. Validate `CFP_DATA_REPO_PATH` is set. -2. If `CFP_DATA_REMOTE` is set: - - If the target is already a git repo, `git fetch` + `git reset --hard origin/`. - - Otherwise `git clone --depth=1 --branch `. -3. Configure git author identity on the local repo (so any commit the API - makes carries `GIT_AUTHOR_NAME`/`GIT_AUTHOR_EMAIL`). -4. `exec node apps/api/dist/index.js`. - -Inside node, `buildApp()` then registers plugins in order -([apps/api/src/app.ts](../../apps/api/src/app.ts)): env validation → CORS → -cookies → trace IDs → error mapper → **store boot (loads public + private into -memory)** → services (FTS) → rate limit → idempotency → session middleware → -swagger → routes → static SPA. The Fastify `listen()` call doesn't fire until -all of those resolve, so by the time `/api/health/ready` can be hit, both -stores have loaded. - -This matches the boot-order section of [deploy.md plan](../../plans/deploy.md). - -## Helm chart - -Chart lives at [`deploy/charts/codeforphilly/`](../../deploy/charts/codeforphilly/). -Three values files: - -- `values.yaml` — defaults (1 replica, Recreate, PVC for the data repo, S3 backend, ingress with cert-manager) -- `values.staging.yaml` — staging host, filesystem private store, scrubbed-snapshot data remote -- `values.production.yaml` — production hosts, S3 private store, real data remote, SSH deploy key - -### Install - -```bash -# Staging (first time) -kubectl create namespace codeforphilly-staging -kubectl -n codeforphilly-staging apply -f path/to/staging-secrets.yaml # see secrets.md -helm upgrade --install codeforphilly-staging \ - deploy/charts/codeforphilly \ - --namespace codeforphilly-staging \ - -f deploy/charts/codeforphilly/values.staging.yaml \ - --set image.tag=sha- - -# Production (first time) -kubectl create namespace codeforphilly -kubectl -n codeforphilly apply -f path/to/production-secrets.yaml -helm upgrade --install codeforphilly \ - deploy/charts/codeforphilly \ - --namespace codeforphilly \ - -f deploy/charts/codeforphilly/values.production.yaml \ - --set image.tag=v -``` - -### What the chart provisions - -| Resource | Purpose | -|----------|---------| -| `Deployment` | 1 replica, `Recreate` strategy, mounts PVC at `/app/data` | -| `Service` (ClusterIP) | Fronts the pod on port 80 → container 3001 | -| `Ingress` | nginx + cert-manager; staging + production hosts | -| `PersistentVolumeClaim` (data) | Working tree for the gitsheets data repo (5Gi default) | -| `PersistentVolumeClaim` (private, staging) | Local jsonl store when `storage.backend=filesystem` | -| `ConfigMap` | Non-secret env (`NODE_ENV`, paths, `CFP_DATA_REMOTE`, etc.) | -| `ServiceAccount` | Empty default — no in-cluster API access needed | - -Secrets are **not** templated in the chart. They are created out-of-band — see -[secrets.md](secrets.md). - -### Probes - -- **Liveness** — `GET /api/health` every 10s. The pod is killed only after - three consecutive failures (~30s). +The container entrypoint (`deploy/docker/entrypoint.sh`) reconciles the +data-repo working tree with origin before exec'ing the API. See the +"Smart entrypoint reconciliation" commit message in `git log +deploy/docker/entrypoint.sh` for the full state machine; in short: + +- in sync → no-op +- behind → fast-forward +- ahead → push (push daemon retries on failure) +- diverged + clean rebase → rebase + push +- diverged + conflicts → push a `conflicts/` branch to origin + and hard-reset local to origin + +Then `exec node apps/api/dist/index.js`. Inside node, `buildApp()` registers +plugins ([apps/api/src/app.ts](../../apps/api/src/app.ts)) in order: env → +CORS → cookies → trace IDs → error mapper → **store** (loads public + +private into memory) → **push daemon** (starts pushing transact'd commits to +`CFP_DATA_REMOTE`) → services (FTS) → rate limit → idempotency → session +middleware → swagger → routes → static SPA. Fastify's `listen()` doesn't +fire until all of those resolve, so once `/api/health/ready` returns 200 +both stores have loaded. + +## Probes + +- **Liveness** — `GET /api/health` every 30s. The pod is killed only after + three consecutive failures (~90s). - **Readiness** — `GET /api/health/ready` every 5s. Returns 503 until the - store plugins have finished decorating Fastify (gitsheets working tree - cloned + private store loaded). Once green, ingress routes traffic. - -## CI/CD - -Two deploy workflows in `.github/workflows/`: - -- `deploy-staging.yml` — triggered on push to `main`. Builds + pushes the - image tagged `sha-` and `staging-latest`, then `helm upgrade --install` - to `codeforphilly-staging`. Gated by GitHub Environment "staging" (first - run requires manual approval; secrets are scoped per-environment). -- `deploy-production.yml` — triggered on tag push matching `v*.*.*`. Same - build, deploys to namespace `codeforphilly`. Gated by Environment - "production" — every deploy goes through an approval gate. - -Both use `--atomic --wait --timeout 5m` so a failed rollout auto-reverts. - -### GitHub Environment secrets - -| Environment | Secret | Purpose | -|-------------|--------|---------| -| staging | `KUBECONFIG_STAGING` | base64-encoded kubeconfig with rights only in `codeforphilly-staging` | -| production | `KUBECONFIG_PRODUCTION` | base64-encoded kubeconfig with rights only in `codeforphilly` | - -The kubeconfigs should be scoped to the namespace via RBAC — the service -account they reference should not have cluster-admin. + store plugins have finished decorating Fastify. Once green, the Gateway + routes traffic. ## Data repo on disk -In production the API operates on a working tree at `/app/data` backed by a -PVC. On every boot the entrypoint refreshes that tree from `CFP_DATA_REMOTE` -(`git fetch && git reset --hard`). The push daemon then pushes commits made -during the pod's lifetime back to the remote. +The API operates on a working tree at `/app/data` backed by a PVC. The +entrypoint reconciles that tree with `CFP_DATA_REMOTE` on every boot; the +push daemon pushes commits made during the pod's lifetime back to the +remote. Implications: -- **PVC contents are ephemeral.** Killing the pod and recreating it does - *not* lose data because the source of truth is the git remote, not the - PVC. The PVC just avoids re-cloning on every restart. -- **The deploy key matters.** If `CFP_DATA_REMOTE` is SSH (the production - default), the entrypoint relies on `GIT_SSH_COMMAND` (rendered into the +- **PVC contents are durable enough to outlive a single pod**, which lets the + push daemon finish pushing any commits made just before pod terminate. + But the source of truth is the git remote, not the PVC — wiping the PVC + is safe (the next boot re-clones). +- **The deploy key matters.** When `CFP_DATA_REMOTE` is SSH (the + default), the entrypoint relies on `GIT_SSH_COMMAND` (set in the ConfigMap) pointing at the mounted private key. Rotation: replace the - Secret, restart the pod. See [secrets.md](secrets.md#data-repo-deploy-key). + SealedSecret, restart the pod. See + [secrets.md](secrets.md#data-repo-deploy-key) and the rotation procedure + in [sandbox-deploy.md](sandbox-deploy.md#rotating-the-deploy-key). -## Bucket provisioning +## Bucket provisioning (production) Production uses an S3-compatible bucket for private storage ([specs/behaviors/private-storage.md](../../specs/behaviors/private-storage.md)). -The bucket is **not** Helm-managed — it's provisioned out-of-band and the -Helm chart just consumes the credentials. +The bucket is provisioned out-of-band and the manifests consume its +credentials via a SealedSecret. Recommended provider: **Cloudflare R2** (zero egress, pennies per month, S3-compatible API). Backblaze B2 or AWS S3 also work. MinIO inside the @@ -199,32 +166,19 @@ Required bucket configuration: - **Versioning enabled.** Hard requirement per [private-storage.md](../../specs/behaviors/private-storage.md#bucket-requirements). - Every PUT increments the object's version; the previous `.jsonl` is - recoverable. Verify with `aws s3api get-bucket-versioning`. - **Lifecycle rule** deleting non-current versions after 365 days. -- **IAM policy** scoped to the bucket only — `s3:GetObject`, - `s3:PutObject`, `s3:ListBucket`, `s3:GetObjectVersion`. No cross-bucket - access; no console access for the service principal. -- **Endpoint URL** plugged into `S3_ENDPOINT` (Helm `publicEnv.S3_ENDPOINT`). -- **Bucket name** plugged into `S3_BUCKET`. -- **Region** (or a placeholder R2 region) into `S3_REGION`. -- **Access keys** stored in the `codeforphilly-secrets` Secret as - `S3_ACCESS_KEY_ID` and `S3_SECRET_ACCESS_KEY`. - -Two physical surfaces: one bucket for staging, one for production. Or one -bucket with two prefixes (`staging/profiles.jsonl`, `prod/profiles.jsonl`) -if cost is tight — the path string is configurable via the private-store -implementation but conventionally we use separate buckets. - -Until a real bucket exists, staging runs on `storage.backend=filesystem` -backed by a PVC — see `values.staging.yaml`. The cutover from filesystem -to S3 is a values change only; the in-memory model is identical. +- **IAM policy** scoped to the bucket only — `s3:GetObject`, `s3:PutObject`, + `s3:ListBucket`, `s3:GetObjectVersion`. No cross-bucket access; no console + access for the service principal. +- **Endpoint URL** → `S3_ENDPOINT` (ConfigMap). +- **Bucket name** → `S3_BUCKET`. +- **Region** → `S3_REGION`. +- **Access keys** → `S3_ACCESS_KEY_ID` + `S3_SECRET_ACCESS_KEY` (Secret). ## Environment variables (reference) -The runtime contract. See [`.env.example`](../../.env.example) for the -exhaustive list with comments; the table below tracks what gets *mounted* -into a production pod. +See [`.env.example`](../../.env.example) for the exhaustive list with +comments. Production pod gets these mounted: | Variable | Source | Notes | |----------|--------|-------| @@ -232,45 +186,29 @@ into a production pod. | `PORT` | ConfigMap | `3001` | | `HOST` | ConfigMap | `0.0.0.0` | | `CFP_DATA_REPO_PATH` | ConfigMap | `/app/data` (PVC mount) | -| `CFP_DATA_REMOTE` | ConfigMap | git URL (ssh in prod, https for snapshot) | -| `CFP_DATA_BRANCH` | ConfigMap | `main` | -| `CFP_WEB_DIST_PATH` | Dockerfile ENV | `/app/apps/web/dist` | -| `STORAGE_BACKEND` | ConfigMap | `s3` (prod) / `filesystem` (staging) | +| `CFP_DATA_REMOTE` | Secret | git URL (ssh in prod) | +| `CFP_DATA_BRANCH` | ConfigMap | e.g. `fixture` / `main` | +| `CFP_WEB_DIST_PATH` | ConfigMap | `/app/apps/web/dist` | +| `STORAGE_BACKEND` | ConfigMap | `s3` (prod) / `filesystem` (sandbox) | | `CFP_PRIVATE_STORAGE_PATH` | ConfigMap | `/app/private-storage` (when filesystem) | | `S3_ENDPOINT` / `S3_BUCKET` / `S3_REGION` | ConfigMap | Bucket addressing | | `S3_ACCESS_KEY_ID` / `S3_SECRET_ACCESS_KEY` | **Secret** | Bucket credentials | -| `GITHUB_OAUTH_CLIENT_ID` | ConfigMap | OAuth app client ID | +| `GITHUB_OAUTH_CLIENT_ID` | **Secret** | OAuth app client ID | | `GITHUB_OAUTH_CLIENT_SECRET` | **Secret** | OAuth app client secret | | `CFP_JWT_SIGNING_KEY` | **Secret** | HS256 key (`openssl rand -base64 64`) | | `SAML_PRIVATE_KEY` / `SAML_CERTIFICATE` | **Secret** | Slack IdP cert chain | -| `GIT_SSH_COMMAND` | ConfigMap (rendered) | Wires `ssh` to the mounted deploy key | -| `GIT_AUTHOR_NAME` / `GIT_AUTHOR_EMAIL` | ConfigMap | Identity on push-daemon commits | +| `GIT_SSH_COMMAND` | ConfigMap | Wires `ssh` to the mounted deploy key | ## Rollback -```bash -# Roll back to the previous Helm release -helm rollback codeforphilly-staging --namespace codeforphilly-staging - -# Or pin to a specific image -helm upgrade codeforphilly-staging deploy/charts/codeforphilly \ - --namespace codeforphilly-staging \ - --reuse-values \ - --set image.tag=sha- -``` - -Note: because every commit/mutation pushes to the data remote synchronously, -rolling the container back is *not* a data rollback. Data rollback is `git -revert` on the data repo. - -## Known unknowns - -- **Cluster choice.** Plan assumes the existing CFP k8s cluster (`k8s.phl.io`). - If a different cluster is targeted, regenerate `KUBECONFIG_STAGING` / - `KUBECONFIG_PRODUCTION` and update the ingress hosts. -- **First staging stand-up.** Provisioning the namespace + creating the - per-environment Secrets is a one-time human operation. The first - `helm upgrade --install` requires those Secrets to already exist. -- **MinIO option.** If the cluster doesn't have an S3 provider available, - add a MinIO subchart under `deploy/charts/codeforphilly/charts/`. Out of - scope for v1. +Two distinct rollback flavors: + +- **Pod / image rollback** — change the image tag in the GitOps repo's + `images:` override (or, for an out-of-band hotfix, `kubectl set image + deployment/codeforphilly ...`). The deployment's `Recreate` strategy + serializes the swap; a few seconds of `503` on the readiness probe is + expected while the new pod boots. +- **Data rollback** — `git revert` (or `git push --force-with-lease` after + a careful local rebase) on the data repo. The next pod-boot entrypoint + reconciliation will pick up the change. Don't conflate the two: rolling + the image back does not undo data writes the API has already pushed. diff --git a/docs/operations/monitoring.md b/docs/operations/monitoring.md index 1fa4ed6..ecb1479 100644 --- a/docs/operations/monitoring.md +++ b/docs/operations/monitoring.md @@ -50,7 +50,7 @@ store decorators are missing (broken boot), `/api/health/ready` will return ## 2. Kubernetes liveness + readiness probes -These already exist in the Helm chart — see [deploy.md probes](deploy.md#probes). +These already exist in `deploy/kustomize/base/deployment.yaml` — see [deploy.md probes](deploy.md#probes). They serve a different purpose than the external monitors: they make k8s **act** on a bad pod (restart it) rather than just notify us. @@ -132,7 +132,7 @@ The cutover lead confirms before T-0: - [ ] UptimeRobot account exists; two monitors above are configured - [ ] UptimeRobot → `#alerts` Slack integration is fired by a test alarm -- [ ] k8s liveness + readiness probes are present in the Helm chart +- [ ] k8s liveness + readiness probes are present in `deploy/kustomize/base/deployment.yaml` - [ ] Log webhook → `#alerts` integration fires on a test `WARN` line - [ ] On-call rotation is set in PagerDuty / Slack handoff doc - [ ] At least one team member can reach `#alerts` outside business hours diff --git a/docs/operations/runbook.md b/docs/operations/runbook.md index ed38794..3944dd9 100644 --- a/docs/operations/runbook.md +++ b/docs/operations/runbook.md @@ -21,11 +21,11 @@ Look for one of the four common boot failures: | Log line excerpt | Cause | Fix | |------------------|-------|-----| -| `[entrypoint] ERROR: CFP_DATA_REMOTE is unset` | The PVC was wiped and the chart isn't providing the remote URL. | Check ConfigMap `-env`; ensure `publicEnv.CFP_DATA_REMOTE` is set in the active values file. | +| `[entrypoint] ERROR: CFP_DATA_REMOTE is unset` | The Secret containing `CFP_DATA_REMOTE` isn't reaching the pod. | Check `kubectl get secret codeforphilly-secrets -o yaml`; verify the SealedSecret in the GitOps repo decrypted successfully (look at the sealed-secrets controller logs). | | `fatal: could not read Username for 'https://...'` or `Permission denied (publickey)` | Bad/missing data-repo credentials. | Verify the `codeforphilly-data-deploy-key` Secret holds a valid `id_ed25519` whose public key has push access to the data repo. See [secrets.md](secrets.md#data-repo-deploy-key). | -| `Failed to open public gitsheets store` | Working tree corrupt or missing `.gitsheets/` configs. | Exec into the pod, inspect `/app/data/.gitsheets/`. Recovery: wipe the PVC and let the entrypoint re-clone (`kubectl delete pvc -data` → recreate via `helm upgrade`). | +| `Failed to open public gitsheets store` | Working tree corrupt or missing `.gitsheets/` configs. | Exec into the pod, inspect `/app/data/.gitsheets/`. Recovery: `kubectl delete pvc codeforphilly-data -n `, then trigger a rollout — the entrypoint re-clones from `CFP_DATA_REMOTE`. | | `Failed to load private store (s3)` | Bucket creds wrong, bucket gone, or network ACL blocks egress. | Confirm `S3_*` env in the ConfigMap + Secret. From the pod, `curl $S3_ENDPOINT` to confirm reachability. | -| `environment variable ... is required` | A required env (`CFP_DATA_REPO_PATH`, `STORAGE_BACKEND`, `CFP_JWT_SIGNING_KEY`) is missing. | Helm values regression. Compare against `values.production.yaml`. | +| `environment variable ... is required` | A required env (`CFP_DATA_REPO_PATH`, `STORAGE_BACKEND`, `CFP_JWT_SIGNING_KEY`) is missing. | Manifest regression. Compare against `deploy/kustomize/base/configmap.yaml` + the GitOps repo's SealedSecret. | ### 2. Drop into the pod (if it stays up long enough) @@ -56,15 +56,19 @@ git ls-remote "$CFP_DATA_REMOTE" 2>&1 | head If the cluster state is unrecoverable but the data remote is intact: ```bash -# Roll back to the last-known-good Helm release -helm -n codeforphilly history codeforphilly -helm -n codeforphilly rollback codeforphilly - -# Or pin to a previous image -helm upgrade codeforphilly deploy/charts/codeforphilly \ - --namespace codeforphilly \ - --reuse-values \ - --set image.tag= +# Revert the most recent GitOps deploy (the cluster repo's deploy PR is a +# normal merge commit on `deploys/k8s-manifests`) +gh -R CodeForPhilly/cfp-sandbox-cluster pr list --base deploys/k8s-manifests --state merged +git -C ~/Repositories/cfp-sandbox-cluster revert --mainline 1 +git -C ~/Repositories/cfp-sandbox-cluster push origin deploys/k8s-manifests + +# Or pin to a previous image by editing the GitOps repo's +# .holo/branches/k8s-manifests/codeforphilly-ng/app/manifests.toml's image +# tag, committing on a hotfix branch, and merging through the deploy PR. + +# Out-of-band hotfix (bypasses GitOps — fix the repo afterward): +kubectl -n codeforphilly-rewrite-sandbox set image \ + deploy/codeforphilly codeforphilly=ghcr.io/codeforphilly/codeforphilly-ng: ``` Data is **not** in the PVC long-term; it's in the git remote. Deleting the @@ -99,8 +103,8 @@ push the backlog. # Watch a deploy kubectl -n codeforphilly rollout status deploy/codeforphilly -# Last 10 Helm releases -helm -n codeforphilly history codeforphilly +# Last 10 GitOps deploys (merge commits on deploys/k8s-manifests) +gh -R CodeForPhilly/cfp-sandbox-cluster pr list --base deploys/k8s-manifests --state merged --limit 10 # Pod resource use kubectl -n codeforphilly top pod diff --git a/docs/operations/secrets.md b/docs/operations/secrets.md index d3370e2..26a0376 100644 --- a/docs/operations/secrets.md +++ b/docs/operations/secrets.md @@ -22,15 +22,18 @@ how it gets into the cluster, and how to rotate it. ## Where they live in the cluster -The Helm chart consumes secrets from two places: +The Deployment consumes secrets from two Secret objects, both materialized +by the sealed-secrets controller from `SealedSecret` resources committed in +the GitOps repo (`cfp-sandbox-cluster/codeforphilly-ng.secrets/`): -| Secret name (default) | Mount mechanism | Holds | -|-----------------------|-----------------|-------| +| Secret name | Mount mechanism | Holds | +|-------------|-----------------|-------| | `codeforphilly-secrets` | `envFrom: secretRef` (entire Secret becomes env) | All env-var secrets | | `codeforphilly-data-deploy-key` | Volume-mounted, one file | SSH private key for the data repo | -Both names are overridable via Helm values (`secretEnvFrom[].name`, -`deployKey.secretName`). +The Secret names are referenced directly from +`deploy/kustomize/base/deployment.yaml`; changing them means touching the +manifest. ## Inventory @@ -173,18 +176,12 @@ kubectl create secret generic codeforphilly-data-deploy-key \ --dry-run=client -o yaml \ | kubeseal --format yaml > deploy/secrets/staging-deploy-key.sealed.yaml -# 4. Apply -kubectl apply -f deploy/secrets/staging-secrets.sealed.yaml -kubectl apply -f deploy/secrets/staging-deploy-key.sealed.yaml +# 4. Commit the sealed YAMLs into the GitOps repo +# (cfp-sandbox-cluster/codeforphilly-ng.secrets/), open a PR. +# The deploy workflow applies them on merge. # 5. Wipe plaintext shred -u .secrets/* - -# 6. Helm install -helm upgrade --install codeforphilly-staging deploy/charts/codeforphilly \ - --namespace codeforphilly-staging \ - -f deploy/charts/codeforphilly/values.staging.yaml \ - --set image.tag=sha- ``` The sealed `.yaml` files are safe to commit; they can only be decrypted by