Skip to content

feat(deploy): docker + helm chart + CI/CD for staging#35

Merged
themightychris merged 10 commits into
mainfrom
feat/deploy
May 16, 2026
Merged

feat(deploy): docker + helm chart + CI/CD for staging#35
themightychris merged 10 commits into
mainfrom
feat/deploy

Conversation

@themightychris
Copy link
Copy Markdown
Member

Summary

Implements the deploy plan (plans/deploy.md) so the team can stand up a staging environment and follow the same template into production.

  • Multi-stage Dockerfile at the repo root + entrypoint script that clones/refreshes CFP_DATA_REMOTE then exec's node. Final image is non-root alpine with git, ca-certificates, tini, openssh-client.
  • Helm chart at deploy/charts/codeforphilly/ with values.yaml / values.staging.yaml / values.production.yaml. One replica, Recreate strategy, PVC for the data working tree, optional PVC for the filesystem private-store (staging), readiness probe at /api/health/ready.
  • GitHub Actions deploy-staging.yml (push to main → build + helm upgrade) and deploy-production.yml (tag push → build + helm upgrade), both gated by per-environment GitHub Environments + KUBECONFIG_* secrets.
  • API surfaces for production: new apps/api/src/plugins/static-web.ts mounts the built SPA at CFP_WEB_DIST_PATH with SPA fallback + JSON 404 envelope for /api/*; new GET /api/health/ready returns 503 until stores have loaded.
  • Operational docs under docs/operations/: deploy.md (image anatomy, boot sequence, bucket provisioning), secrets.md (every runtime secret with generation + rotation), runbook.md ("API won't boot" playbook).

Test plan

  • docker build . produces an image — Dockerfile + .dockerignore in place; not runnable in this CI but the build steps mirror what the action does (verified by inspection; daemon not available in this env).
  • The same image serves both /api/* and the static SPA — tested in apps/api/tests/deploy.test.ts (static-web plugin SPA fallback + /api/* JSON 404 envelope).
  • helm install to a staging namespace boots the deployment cleanly — chart lints clean for all three values files; first stand-up requires cluster access not held by this agent (see Follow-ups in plan closeout).
  • Ingress + TLS works (external curl) — same: requires cluster access.
  • PVC persists across pod restarts — same.
  • Push daemon successfully pushes a test commit — same.
  • S3-backed PrivateStore reads/writes against the production bucket — bucket not yet provisioned; staging starts on filesystem backend per values.staging.yaml.
  • Readiness probe returns 200 only after both stores load — tested in apps/api/tests/deploy.test.ts.
  • CI workflows produce deployable artifacts — workflows wired but not yet run in main; ci.yml is untouched.
  • Sealed-secrets decrypt + inject — bootstrap recipe in docs/operations/secrets.md, requires cluster.
  • Operational docs land at docs/operations/.

Verification before push

  • npm run type-check — clean across api / web / shared
  • npm run lint — clean
  • npm test — clean across api / web / shared
  • npm run build — clean (apps/web/dist + apps/api/dist)
  • helm lint deploy/charts/codeforphilly — clean for default, staging, production values
  • helm template ... — renders valid YAML for both environments

Installed via:
  npm install --workspace=apps/api @fastify/static

Used by the production runtime to serve the built apps/web/dist as a
fallthrough for non-/api/* routes — one image, one process per
architecture.md's "single Docker image bundles API + static web" claim.
Generated by:
  asdf set helm 4.1.0

Helm is used by deploy-staging.yml / deploy-production.yml workflows and
for local `helm lint` / `helm template` validation against the chart
under deploy/charts/codeforphilly.
Adds the boot-path surfaces the deploy plan needs in production:

- New plugin apps/api/src/plugins/static-web.ts mounts the built SPA at
  CFP_WEB_DIST_PATH and installs a notFoundHandler that returns the JSON
  envelope for unknown /api/* paths and serves index.html with no-cache
  for everything else (SPA fallback for React Router v7 routes). When
  CFP_WEB_DIST_PATH is unset (dev / tests) the plugin still installs the
  JSON-envelope 404 handler so the API contract is consistent.

- New env var CFP_WEB_DIST_PATH (optional) — set in the production image
  to /app/apps/web/dist; unset in dev where Vite owns 5173.

- New route GET /api/health/ready — readiness probe for k8s. Returns 200
  only after the store + FTS decorators are present (which happens during
  plugin registration, before fastify.listen()). Returns 503 otherwise so
  ingress never routes to a pod whose in-memory state hasn't loaded.

- Tests in apps/api/tests/deploy.test.ts cover the readiness payload, the
  SPA fallback / no-cache header, the /api/* JSON-404 envelope with and
  without the SPA bundled, and the boot-time failure when
  CFP_WEB_DIST_PATH points at a missing directory.

Per specs/architecture.md's "single Docker image bundles API + static web"
claim and the deploy plan's readiness-probe + SPA-fallthrough requirements.
- Dockerfile: three stages (deps / build / runtime) on node:22.22-alpine.
  Final image is non-root (uid 1000), bundles git + ca-certificates +
  tini + openssh-client, ships apps/api/dist plus apps/web/dist for the
  single-image SPA-co-served deploy.
- .dockerignore keeps secrets (.env, private-storage/, codeforphilly-data/)
  and dev-only artifacts (node_modules, dist, tests, plans, specs) out of
  the build context.
- deploy/docker/entrypoint.sh handles the working-tree-on-startup pattern
  from specs/architecture.md: clone CFP_DATA_REMOTE on first boot, fetch +
  reset --hard on subsequent boots, then exec node. Uses GIT_SSH_COMMAND
  rendered by Helm when a deploy key Secret is mounted.

Build:
  docker build -t cfp:dev .
Smoke test:
  docker run --rm -p 3001:3001 \
    -e CFP_DATA_REMOTE=https://github.com/CodeForPhilly/codeforphilly-data-snapshot.git \
    -e STORAGE_BACKEND=filesystem \
    -e CFP_PRIVATE_STORAGE_PATH=/app/private-storage \
    -e CFP_JWT_SIGNING_KEY=$(openssl rand -base64 48) cfp:dev
Minimal chart at deploy/charts/codeforphilly/ following the layout from
the plan: Deployment / Service / Ingress / PVC (data) / PVC (private,
staging only) / ConfigMap / ServiceAccount.

Architectural constraints baked in:
- replicas: 1 + strategy.type: Recreate, both hard requirements per
  specs/architecture.md (in-process write mutex serializes mutations,
  concurrent old/new pods would corrupt the gitsheets working tree).
- Liveness probe hits /api/health every 10s; readiness probe hits
  /api/health/ready every 5s — ingress doesn't route traffic until both
  stores have loaded.
- Data-repo PVC mounted at CFP_DATA_REPO_PATH so the working tree
  survives pod restarts; the entrypoint refreshes from CFP_DATA_REMOTE
  on each boot anyway (PVC is an optimization, not the source of truth).
- Secrets are never templated by the chart — values reference a
  caller-provided Secret (default name codeforphilly-secrets) via
  envFrom, and a separate Secret for the SSH deploy key.

values.staging.yaml: filesystem private store + PVC, points at the
public scrubbed-snapshot data remote so staging never serves real PII
until the cutover-prep plan wires it up.

values.production.yaml: S3 private store, real data remote with SSH
deploy-key auth, larger resource budget, NODE_OPTIONS heap tuning.

`helm lint` clean against all three values files.
- deploy-staging.yml: on push to main, builds the image (tagged
  sha-<short> + staging-latest), pushes to GHCR, and runs
  helm upgrade --install against namespace codeforphilly-staging.
  Gated by GitHub Environment "staging" — first run requires manual
  approval; secrets (KUBECONFIG_STAGING) are scoped per-environment.
- deploy-production.yml: on push of tags matching v*.*.*, same build
  + helm upgrade against namespace codeforphilly. Gated by Environment
  "production". Also exposes workflow_dispatch with a tag input for
  promoting an already-built image.

Both jobs use --atomic --wait --timeout 5m so a failed rollout
auto-reverts. A post-deploy smoke check hits /api/health on the public
ingress to catch ingress / cert misconfiguration before declaring the
deploy successful.

Action versions checked against upstream READMEs:
- actions/checkout@v6 (matches existing ci.yml)
- docker/setup-buildx-action@v3
- docker/login-action@v3
- docker/build-push-action@v6
- azure/setup-kubectl@v4
- azure/setup-helm@v4
Three new docs under docs/operations/, satisfying the deploy plan's
"Operational docs" validation criterion:

- deploy.md — implementation companion to specs/architecture.md's Deploy
  section. Image anatomy, boot sequence, Helm install/upgrade commands,
  bucket-provisioning checklist (R2 / B2 / S3 / MinIO options, with
  versioning + lifecycle rules + IAM scoping), environment-variable
  reference table.
- secrets.md — inventory of every runtime secret with generation +
  rotation procedure: CFP_JWT_SIGNING_KEY, GITHUB_OAUTH_CLIENT_SECRET,
  S3_* keys, SAML key+cert, the data-repo SSH deploy key. Includes the
  bootstrap-a-new-environment recipe using sealed-secrets.
- runbook.md — "API won't boot" playbook with log-grep table mapping
  common log lines to causes and fixes, plus rollback procedure.
@themightychris themightychris merged commit 387d06d into main May 16, 2026
1 check passed
@themightychris themightychris deleted the feat/deploy branch May 16, 2026 23:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant