Add policy_stopped workload status and enforce policy gate on restart#4857
Add policy_stopped workload status and enforce policy gate on restart#4857
Conversation
Introduces WorkloadStatusPolicyStopped as a first-class workload status to support enterprise policy enforcement. When non-registry servers are blocked by policy, workloads can be set to this status so the UI and CLI can surface the reason clearly, rather than showing an opaque "error". - Add WorkloadStatusPolicyStopped constant to pkg/container/runtime/types.go - Update enums tag in pkg/core/workload.go to include policy_stopped - Map policy_stopped to BackendUnhealthy in mapWorkloadStatusToVMCPHealth - Call EagerCheckCreateServer in restartSingleWorkload so the policy gate blocks restart of workloads that violate the active policy - Add 🚫 indicator for policy_stopped in CLI list and status output, following the existing⚠️ pattern for unauthenticated workloads - Regenerate swagger docs to include the new status enum value Part of stacklok/stacklok-enterprise-platform#406 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #4857 +/- ##
==========================================
+ Coverage 69.06% 69.07% +0.01%
==========================================
Files 531 531
Lines 55172 55182 +10
==========================================
+ Hits 38104 38117 +13
+ Misses 14140 14138 -2
+ Partials 2928 2927 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Add all WorkloadStatus cases to the switches in list and status commands to satisfy the exhaustive linter. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
@claude can you verify if the comments were addressed? |
- Add policy_stopped to workloadStatusResponse enums tag so the swagger spec for the status endpoint includes the new value - Add EagerCheckCreateServer check in maybeSetupContainerWorkload after loadRunnerFromState to cover the path where the outer LoadState call fails on a partial name but succeeds after label-based name resolution - Regenerate swagger docs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
@claude can you review if the comments have been addressed? |
|
Claude finished @reyortiz3's task in 56s —— View job Review comment verification
Both previous review comments have been addressed. ✅ Comment 1 — Missing
|
| // Check policy gates before restarting — the loaded RunConfig carries the same | ||
| // fields (RegistryAPIURL, RegistryURL, RemoteURL) that the gate evaluates on create. | ||
| if err := runner.EagerCheckCreateServer(ctx, runConfig); err != nil { | ||
| return fmt.Errorf("server restart blocked by policy: %w", err) |
There was a problem hiding this comment.
Both policy gate check sites return an error here but never call SetWorkloadStatus(ctx, name, rt.WorkloadStatusPolicyStopped, ...). The constant is defined, the CLI renders the indicator, and the swagger docs list it, but nothing ever transitions a workload into this status. A blocked restart just leaves the workload as stopped.
Is this intentional -- e.g. the operator or some downstream consumer will set this? If so it might be worth a comment saying so. Otherwise the display code and health mapping for policy_stopped are unreachable.
| return vmcp.BackendUnauthenticated | ||
| case rt.WorkloadStatusPolicyStopped: | ||
| return vmcp.BackendUnhealthy | ||
| default: |
There was a problem hiding this comment.
Mapping PolicyStopped to BackendUnhealthy means vMCP can't distinguish "crashed" from "blocked by policy". Compare with Unauthenticated which gets its own BackendUnauthenticated variant. Is the generic mapping intentional here, or would a dedicated BackendPolicyStopped be worth adding for parity?
There was a problem hiding this comment.
I noticed this, but it's the same pattern as above, for example:
case rt.WorkloadStatusStopped, rt.WorkloadStatusError, rt.WorkloadStatusStopping, rt.WorkloadStatusRemoving:
return vmcp.BackendUnhealthy
| case rt.WorkloadStatusPolicyStopped: | ||
| status = "🚫 " + status | ||
| case rt.WorkloadStatusRunning, rt.WorkloadStatusStopped, rt.WorkloadStatusError, | ||
| rt.WorkloadStatusStarting, rt.WorkloadStatusStopping, rt.WorkloadStatusUnhealthy, |
There was a problem hiding this comment.
Nit: this switch block is duplicated verbatim in status.go. Might be worth extracting a small helper in common.go so future status indicators only need updating in one place.
…ndicator - Call SetWorkloadStatus with WorkloadStatusPolicyStopped in both policy gate failure paths in restartSingleWorkload and maybeSetupContainerWorkload, so a blocked restart transitions the workload into the new status instead of leaving it as stopped - Extract the status indicator switch (⚠️ / 🚫) from list.go and status.go into a single workloadStatusIndicator helper in common.go Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
@claude can you review what comments are pending address? |
|
Claude finished @reyortiz3's task in 1m 18s —— View job Pending comment review
All three of @jhrozek's review comments have been addressed. Here's the breakdown: Comment 1 —
|
| Comment | Commenter | Status |
|---|---|---|
SetWorkloadStatus never called for policy_stopped |
jhrozek | ✅ Fixed |
BackendUnhealthy vs dedicated BackendPolicyStopped |
jhrozek | 💬 Author justified; awaiting reviewer acknowledgement |
Duplicated switch block in list.go/status.go |
jhrozek | ✅ Fixed (extracted to common.go) |
The only item that is technically still "open" is the BackendUnhealthy mapping question — but it's a design discussion, not a required change. The thread needs @jhrozek to respond or be dismissed.
jhrozek
left a comment
There was a problem hiding this comment.
Changes look good. The status-setting and deduplication fixes address the review comments well.
Summary
non_registry_serverspolicy on existing workloads.WorkloadStatusPolicyStoppedas a first-class status (following the precedent ofWorkloadStatusUnauthenticated) so the policy layer can set it when stopping a non-compliant workload, giving the UI and CLI a programmatic signal rather than freeform text.EagerCheckCreateServerinrestartSingleWorkloadso the policy gate blocks any restart attempt for workloads that violate the active policy.Changes
pkg/container/runtime/types.goWorkloadStatusPolicyStoppedconstantpkg/core/workload.gopolicy_stoppedtoenumstag for swaggerpkg/workloads/manager.gopolicy_stopped→BackendUnhealthy; callEagerCheckCreateServerin restart pathcmd/thv/app/list.go,status.gopolicy_stoppedin CLI outputdocs/server/swagger.*,docs.goType of change
Test plan
task buildpassestask testpasses (pre-existing failures inpkg/clientandpkg/telemetryare unrelated to this change and reproduce onmain)task docsDoes this introduce a user-facing change?
CLI
listandstatuscommands will show🚫 policy_stoppedfor workloads stopped by policy, making the reason immediately visible rather than showing a plainstoppedorerror.Generated with Claude Code