fix: commitments package refactoring/pipeline config#601
Conversation
- commitment uuid test - reservation deletion respects allocations and targetHost - fix pipeline - set time correctly - refactor package and controller - refactor logging
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughRefactors the reservations controller into a commitments-focused controller, replaces the external HTTP scheduling flow with a SchedulerClient/Kubernetes-based scheduling path, adds KVM committed-resource pipelines and flavor→pipeline config, extends commitments config and API (enablement, validation, context-aware logging), and updates tests and deletion-priority behavior. Changes
Sequence Diagram(s)sequenceDiagram
participant CRC as CommitmentReservationController
participant K8s as Kubernetes API
participant SC as SchedulerClient
CRC->>K8s: List Hypervisors (listServersByProjectID)
K8s-->>CRC: Hypervisor objects
CRC->>CRC: Build eligibleHosts from Hypervisors
CRC->>CRC: getPipelineForFlavorGroup(flavorGroup)
Note right of CRC: exact flavor-group → wildcard (*) → PipelineDefault
CRC->>SC: ScheduleReservation(eligibleHosts, pipeline, request)
SC-->>CRC: Selected host / decision
CRC->>K8s: Patch Reservation.Spec.TargetHost
K8s-->>CRC: Patched
CRC->>K8s: Patch Reservation.Status.Conditions (Ready)
K8s-->>CRC: Patched
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
📝 Coding Plan
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (3)
internal/scheduling/reservations/commitments/api_change_commitments_test.go (1)
1280-1296: Condition Reason should use CamelCase format.Kubernetes condition reasons typically use CamelCase without spaces (e.g.,
ReservationInvalidinstead of"Reservation invalid"). This is a convention for machine-readable identifiers.Proposed fix
res.Status.Conditions = []metav1.Condition{ { Type: v1alpha1.ReservationConditionReady, Status: metav1.ConditionFalse, - Reason: "Reservation invalid", + Reason: "ReservationInvalid", Message: reason, LastTransitionTime: metav1.Now(), }, }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/commitments/api_change_commitments_test.go` around lines 1280 - 1296, The Condition Reason in markReservationFailedStatus should be a CamelCase machine-readable identifier instead of the current "Reservation invalid"; update the Reason field in the metav1.Condition created by markReservationFailedStatus to use a CamelCase token such as "ReservationInvalid" (keeping the rest of the condition fields and the env.K8sClient.Status().Update call unchanged).internal/scheduling/reservations/commitments/controller.go (1)
199-203: Hardcoded requeue interval when flavor not found.Line 202 uses a hardcoded
5 * time.Minuterequeue interval when flavor is not found, while other retry scenarios user.Conf.RequeueIntervalRetry. Consider using the config value for consistency.Proposed fix
if flavorDetails == nil { logger.Error(errors.New("flavor not found"), "flavor not found in any flavor group", "resourceName", resourceName) - return ctrl.Result{RequeueAfter: 5 * time.Minute}, nil + return ctrl.Result{RequeueAfter: r.Conf.RequeueIntervalRetry}, nil }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/commitments/controller.go` around lines 199 - 203, The nil-check handling for flavorDetails currently returns a hardcoded ctrl.Result{RequeueAfter: 5 * time.Minute}; replace that literal with the configured retry interval (r.Conf.RequeueIntervalRetry) to match other retry scenarios; update the branch in the reconciliation code where flavorDetails == nil (the logger.Error call and subsequent return) to return ctrl.Result{RequeueAfter: r.Conf.RequeueIntervalRetry} so the controller consistently uses the configured interval.internal/scheduling/reservations/commitments/config.go (1)
57-65: DefaultConfig does not set default for Endpoints.NovaExternalScheduler or KeystoneSecretRef.The
DefaultConfig()function doesn't initializeEndpoints.NovaExternalScheduler(empty string) orKeystoneSecretRef(zero value). If these are required for the controller to function, the controller should validate them during initialization and fail fast with a clear error message rather than silently failing later.Consider adding validation in the controller's
Initmethod or documenting that these must be configured via Helm values.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/commitments/config.go` around lines 57 - 65, DefaultConfig currently leaves Endpoints.NovaExternalScheduler and KeystoneSecretRef unset; add a fail-fast validation in the controller Init method to check Config.Endpoints.NovaExternalScheduler != "" and Config.KeystoneSecretRef is non-zero (e.g., non-empty name/namespace) and return a clear error if missing, so the controller initialization fails early with a descriptive message; alternatively (if desired) set sensible defaults in DefaultConfig for these fields, but ensure Init enforces presence before proceeding.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/scheduling/reservations/commitments/api_change_commitments.go`:
- Around line 366-368: The code appends the original `res` into
`failedReservations` for the `metav1.ConditionFalse` case, while other failure
cases append the freshly fetched `current`, leading to potential stale data;
update the `case metav1.ConditionFalse` branch to append `current` instead of
`res` (or consistently use one source across all branches) so
`failedReservations` always contains the latest reservation state from the
`current` variable (also check other branches in the same switch that append to
`failedReservations` to ensure consistent use of `current` vs `res`).
In `@internal/scheduling/reservations/commitments/controller.go`:
- Around line 430-434: Add validation in the Init() of the commitment
reservation controller to ensure conf.Endpoints.NovaExternalScheduler is
non-empty before creating r.SchedulerClient: check
conf.Endpoints.NovaExternalScheduler == "" and return an error like "nova
external scheduler endpoint must be configured" if empty, otherwise proceed to
call reservations.NewSchedulerClient(conf.Endpoints.NovaExternalScheduler) and
log as before; reference the Init() method and the r.SchedulerClient assignment
to locate where to insert this check.
In `@internal/scheduling/reservations/commitments/reservation_manager.go`:
- Around line 122-139: The current deletion logic picks a reservation with
Spec.TargetHost == "" even if it still has allocations; change it so the hard
requirement is len(res.Spec.CommittedResourceReservation.Allocations) == 0
before a slot is considered deletable, and only use Spec.TargetHost == "" as a
tie-breaker among those zero-allocation slots. Concretely, in
reservation_manager.go adjust the selection loops for reservationToDelete: first
collect or scan for reservations where
len(res.Spec.CommittedResourceReservation.Allocations) == 0 and pick one (prefer
one with res.Spec.TargetHost == "" if multiple), and only if none have zero
allocations consider other fallback behavior; update any modifications to
existing slice accordingly and remove the prior early branch that deleted based
solely on TargetHost == "".
In `@internal/scheduling/reservations/commitments/state.go`:
- Around line 18-19: The current commitmentUUIDPattern is too permissive and
allows uppercase and variable hyphenated slugs that later become invalid
Kubernetes metadata.name and cause reconciliation collisions; replace the
pattern with a strict lowercase UUID matcher (e.g.
/^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/) and enforce it
where CommitmentUUID is accepted so only canonical lowercase UUIDs enter
CommitmentState; update commitmentUUIDPattern accordingly and add/ensure a
validation check in the CommitmentState constructor or the code path that sets
CommitmentUUID (the same place that feeds reservation_manager.go's
metadata.name) to return an error/reject values that do not match the new
pattern rather than attempting to normalize them.
---
Nitpick comments:
In `@internal/scheduling/reservations/commitments/api_change_commitments_test.go`:
- Around line 1280-1296: The Condition Reason in markReservationFailedStatus
should be a CamelCase machine-readable identifier instead of the current
"Reservation invalid"; update the Reason field in the metav1.Condition created
by markReservationFailedStatus to use a CamelCase token such as
"ReservationInvalid" (keeping the rest of the condition fields and the
env.K8sClient.Status().Update call unchanged).
In `@internal/scheduling/reservations/commitments/config.go`:
- Around line 57-65: DefaultConfig currently leaves
Endpoints.NovaExternalScheduler and KeystoneSecretRef unset; add a fail-fast
validation in the controller Init method to check
Config.Endpoints.NovaExternalScheduler != "" and Config.KeystoneSecretRef is
non-zero (e.g., non-empty name/namespace) and return a clear error if missing,
so the controller initialization fails early with a descriptive message;
alternatively (if desired) set sensible defaults in DefaultConfig for these
fields, but ensure Init enforces presence before proceeding.
In `@internal/scheduling/reservations/commitments/controller.go`:
- Around line 199-203: The nil-check handling for flavorDetails currently
returns a hardcoded ctrl.Result{RequeueAfter: 5 * time.Minute}; replace that
literal with the configured retry interval (r.Conf.RequeueIntervalRetry) to
match other retry scenarios; update the branch in the reconciliation code where
flavorDetails == nil (the logger.Error call and subsequent return) to return
ctrl.Result{RequeueAfter: r.Conf.RequeueIntervalRetry} so the controller
consistently uses the configured interval.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: a83cc15a-67bc-45c9-8b4a-2594ef914300
📒 Files selected for processing (14)
cmd/main.gohelm/bundles/cortex-nova/templates/pipelines_kvm.yamlhelm/bundles/cortex-nova/values.yamlinternal/scheduling/reservations/commitments/api_change_commitments.gointernal/scheduling/reservations/commitments/api_change_commitments_test.gointernal/scheduling/reservations/commitments/config.gointernal/scheduling/reservations/commitments/context.gointernal/scheduling/reservations/commitments/controller.gointernal/scheduling/reservations/commitments/controller_test.gointernal/scheduling/reservations/commitments/reservation_manager.gointernal/scheduling/reservations/commitments/reservation_manager_test.gointernal/scheduling/reservations/commitments/state.gointernal/scheduling/reservations/monitor.gointernal/scheduling/reservations/monitor_test.go
| case metav1.ConditionFalse: | ||
| allChecked = false | ||
| failedReservations = append(failedReservations, res) | ||
| case metav1.ConditionUnknown: |
There was a problem hiding this comment.
Inconsistent use of res vs current when appending failed reservations.
At line 367, when ConditionFalse is detected, res (the original from reservationsToWatch) is appended to failedReservations. However, for ConditionTrue with missing host (line 359), current (the freshly fetched state) is appended. This inconsistency could lead to stale data in the failedReservations slice.
Proposed fix for consistency
case metav1.ConditionFalse:
- failedReservations = append(failedReservations, res)
+ failedReservations = append(failedReservations, current)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| case metav1.ConditionFalse: | |
| allChecked = false | |
| failedReservations = append(failedReservations, res) | |
| case metav1.ConditionUnknown: | |
| case metav1.ConditionFalse: | |
| failedReservations = append(failedReservations, current) | |
| case metav1.ConditionUnknown: |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/scheduling/reservations/commitments/api_change_commitments.go`
around lines 366 - 368, The code appends the original `res` into
`failedReservations` for the `metav1.ConditionFalse` case, while other failure
cases append the freshly fetched `current`, leading to potential stale data;
update the `case metav1.ConditionFalse` branch to append `current` instead of
`res` (or consistently use one source across all branches) so
`failedReservations` always contains the latest reservation state from the
`current` variable (also check other branches in the same switch that append to
`failedReservations` to ensure consistent use of `current` vs `res`).
|
|
||
| // Initialize scheduler client | ||
| r.SchedulerClient = reservations.NewSchedulerClient(conf.Endpoints.NovaExternalScheduler) | ||
| logf.FromContext(ctx).Info("scheduler client initialized for commitment reservation controller", "url", conf.Endpoints.NovaExternalScheduler) | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if there's validation for NovaExternalScheduler endpoint in the codebase
rg -n "NovaExternalScheduler" --type=go -C 3Repository: cobaltcore-dev/cortex
Length of output: 10224
🏁 Script executed:
#!/bin/bash
# Check config validation logic and where config is loaded
rg -n "EndpointsConfig|NovaExternalScheduler" --type=go -B 5 -A 5 internal/scheduling/reservations/commitments/config.goRepository: cobaltcore-dev/cortex
Length of output: 1108
🏁 Script executed:
#!/bin/bash
# Find and inspect NewSchedulerClient function
rg -n "NewSchedulerClient" --type=go -A 10Repository: cobaltcore-dev/cortex
Length of output: 4200
🏁 Script executed:
#!/bin/bash
# Check if there's any validation in the config package
rg -n "validate|Validate|required" --type=go internal/scheduling/reservations/commitments/config.goRepository: cobaltcore-dev/cortex
Length of output: 47
🏁 Script executed:
#!/bin/bash
# Check where config is loaded/parsed and if there's validation there
rg -n "Load|Parse|Unmarshal" --type=go internal/scheduling/reservations/commitments/ | head -20Repository: cobaltcore-dev/cortex
Length of output: 2552
🏁 Script executed:
#!/bin/bash
# Check the full Setup function where the endpoint is initialized to see if there's validation
cat -n internal/scheduling/reservations/commitments/controller.go | sed -n '400,436p'Repository: cobaltcore-dev/cortex
Length of output: 1856
🏁 Script executed:
#!/bin/bash
# Check if SchedulerClient validates URL when making requests
rg -n "func.*SchedulerClient.*Request\|func.*Schedule" --type=go -A 10 internal/scheduling/reservations/scheduler_client.go | head -40Repository: cobaltcore-dev/cortex
Length of output: 47
🏁 Script executed:
#!/bin/bash
# Check the full SchedulerClient implementation to see if URL is validated when making requests
cat -n internal/scheduling/reservations/scheduler_client.goRepository: cobaltcore-dev/cortex
Length of output: 8062
🏁 Script executed:
#!/bin/bash
# Check how the config is loaded in main.go or tests to see if there's validation
rg -n "Config\{|EndpointsConfig\{|NovaExternalScheduler:" --type=go -A 3 cmd/main.go | head -30Repository: cobaltcore-dev/cortex
Length of output: 47
🏁 Script executed:
#!/bin/bash
# Check if there's any validation during SetupWithManager or controller initialization
rg -n "SetupWithManager|ValidateConfig|validate" --type=go internal/scheduling/reservations/commitments/controller.goRepository: cobaltcore-dev/cortex
Length of output: 244
🏁 Script executed:
#!/bin/bash
# Check how config is loaded in cmd/main.go to see if there's validation there
rg -n "commitments.Config\|commitments.NewController\|endpoints.*NovaExternalScheduler" --type=go cmd/main.go -B 2 -A 2Repository: cobaltcore-dev/cortex
Length of output: 47
🏁 Script executed:
#!/bin/bash
# Test if http.NewRequestWithContext accepts empty URL to understand runtime behavior
python3 << 'EOF'
# Check what happens with empty URL in Go's net/http
import subprocess
result = subprocess.run(['go', 'doc', 'net/http.NewRequest'],
capture_output=True, text=True, cwd='/tmp')
# If we can't check docs, we know from Go standard library that empty URL will cause an error
print("Empty URL will cause 'net.URL.Parse: invalid URL' error when http.NewRequestWithContext tries to parse it")
EOFRepository: cobaltcore-dev/cortex
Length of output: 170
🏁 Script executed:
#!/bin/bash
# Check the config struct definition more completely to see if there's any validation tags
rg -n "type.*Config\|type EndpointsConfig" --type=go -A 20 internal/scheduling/reservations/commitments/config.goRepository: cobaltcore-dev/cortex
Length of output: 47
Add validation for NovaExternalScheduler endpoint.
The SchedulerClient is initialized with conf.Endpoints.NovaExternalScheduler, which has no default value and is not validated during controller initialization. If this endpoint is empty or misconfigured, the scheduler calls will fail at runtime with a low-level HTTP parsing error. Add validation in the Init() function to check that the endpoint is configured and provide a clear error message:
Suggested validation
if conf.Endpoints.NovaExternalScheduler == "" {
return errors.New("nova external scheduler endpoint must be configured")
}
r.SchedulerClient = reservations.NewSchedulerClient(conf.Endpoints.NovaExternalScheduler)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/scheduling/reservations/commitments/controller.go` around lines 430
- 434, Add validation in the Init() of the commitment reservation controller to
ensure conf.Endpoints.NovaExternalScheduler is non-empty before creating
r.SchedulerClient: check conf.Endpoints.NovaExternalScheduler == "" and return
an error like "nova external scheduler endpoint must be configured" if empty,
otherwise proceed to call
reservations.NewSchedulerClient(conf.Endpoints.NovaExternalScheduler) and log as
before; reference the Init() method and the r.SchedulerClient assignment to
locate where to insert this check.
| // prefer ones that are not scheduled, or alternatively, unused reservation slot, or simply remove last one | ||
| var reservationToDelete *v1alpha1.Reservation | ||
| for i, res := range existing { | ||
| if len(res.Spec.CommittedResourceReservation.Allocations) == 0 { | ||
| if res.Spec.TargetHost == "" { | ||
| reservationToDelete = &res | ||
| existing = append(existing[:i], existing[i+1:]...) // remove from existing list | ||
| break | ||
| } | ||
| } | ||
| if reservationToDelete == nil { | ||
| for i, res := range existing { | ||
| if len(res.Spec.CommittedResourceReservation.Allocations) == 0 { | ||
| reservationToDelete = &res | ||
| existing = append(existing[:i], existing[i+1:]...) // remove from existing list | ||
| break | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Require zero allocations before a slot becomes deletable.
The new first pass removes any reservation with TargetHost == "". If a live slot still has allocations but no target host, this branch deletes it before the Allocations == 0 fallback ever runs. Make len(...Allocations) == 0 the hard gate, and use TargetHost == "" only as the tie-breaker among unused slots.
Proposed fix
- for i, res := range existing {
- if res.Spec.TargetHost == "" {
+ for i, res := range existing {
+ if len(res.Spec.CommittedResourceReservation.Allocations) == 0 &&
+ res.Spec.TargetHost == "" {
reservationToDelete = &res
existing = append(existing[:i], existing[i+1:]...) // remove from existing list
break
}
}
if reservationToDelete == nil {
for i, res := range existing {
if len(res.Spec.CommittedResourceReservation.Allocations) == 0 {📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // prefer ones that are not scheduled, or alternatively, unused reservation slot, or simply remove last one | |
| var reservationToDelete *v1alpha1.Reservation | |
| for i, res := range existing { | |
| if len(res.Spec.CommittedResourceReservation.Allocations) == 0 { | |
| if res.Spec.TargetHost == "" { | |
| reservationToDelete = &res | |
| existing = append(existing[:i], existing[i+1:]...) // remove from existing list | |
| break | |
| } | |
| } | |
| if reservationToDelete == nil { | |
| for i, res := range existing { | |
| if len(res.Spec.CommittedResourceReservation.Allocations) == 0 { | |
| reservationToDelete = &res | |
| existing = append(existing[:i], existing[i+1:]...) // remove from existing list | |
| break | |
| } | |
| } | |
| } | |
| // prefer ones that are not scheduled, or alternatively, unused reservation slot, or simply remove last one | |
| var reservationToDelete *v1alpha1.Reservation | |
| for i, res := range existing { | |
| if len(res.Spec.CommittedResourceReservation.Allocations) == 0 && | |
| res.Spec.TargetHost == "" { | |
| reservationToDelete = &res | |
| existing = append(existing[:i], existing[i+1:]...) // remove from existing list | |
| break | |
| } | |
| } | |
| if reservationToDelete == nil { | |
| for i, res := range existing { | |
| if len(res.Spec.CommittedResourceReservation.Allocations) == 0 { | |
| reservationToDelete = &res | |
| existing = append(existing[:i], existing[i+1:]...) // remove from existing list | |
| break | |
| } | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/scheduling/reservations/commitments/reservation_manager.go` around
lines 122 - 139, The current deletion logic picks a reservation with
Spec.TargetHost == "" even if it still has allocations; change it so the hard
requirement is len(res.Spec.CommittedResourceReservation.Allocations) == 0
before a slot is considered deletable, and only use Spec.TargetHost == "" as a
tie-breaker among those zero-allocation slots. Concretely, in
reservation_manager.go adjust the selection loops for reservationToDelete: first
collect or scan for reservations where
len(res.Spec.CommittedResourceReservation.Allocations) == 0 and pick one (prefer
one with res.Spec.TargetHost == "" if multiple), and only if none have zero
allocations consider other fallback behavior; update any modifications to
existing slice accordingly and remove the prior early branch that deleted based
solely on TargetHost == "".
| // commitmentUUIDPattern validates commitment UUID format | ||
| var commitmentUUIDPattern = regexp.MustCompile(`^[a-zA-Z0-9-]{6,64}$`) |
There was a problem hiding this comment.
Tighten this validator before the UUID is used in reservation names.
This pattern still accepts uppercase and arbitrary hyphenated slugs. internal/scheduling/reservations/commitments/reservation_manager.go, Line 266 embeds CommitmentUUID into metadata.name, so values like ABCDEF pass here and later generate an invalid Kubernetes object name. It also groups reservations by the prefix commitment-%s- at Line 65, so values like abc and abc-1 can overlap during reconciliation. Please normalize/reject anything that is not a fixed lowercase UUID shape before it enters CommitmentState.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/scheduling/reservations/commitments/state.go` around lines 18 - 19,
The current commitmentUUIDPattern is too permissive and allows uppercase and
variable hyphenated slugs that later become invalid Kubernetes metadata.name and
cause reconciliation collisions; replace the pattern with a strict lowercase
UUID matcher (e.g.
/^[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}$/) and enforce it
where CommitmentUUID is accepted so only canonical lowercase UUIDs enter
CommitmentState; update commitmentUUIDPattern accordingly and add/ensure a
validation check in the CommitmentState constructor or the code path that sets
CommitmentUUID (the same place that feeds reservation_manager.go's
metadata.name) to return an error/reject values that do not match the new
pattern rather than attempting to normalize them.
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
internal/scheduling/reservations/commitments/controller.go (2)
110-129:⚠️ Potential issue | 🟠 MajorDon't send already-placed reservations back through scheduling.
Once
res.Spec.TargetHostis set, this reconcile has enough state to recover from a partial previous run. Falling through toScheduleReservation()can pick a different host and overwrite the original placement if an earlier reconcile already patchedSpec.TargetHostbut failed later while updating status. Recover status/conditions from spec and return here instead of scheduling again.Possible direction
- // Sync Spec values to Status fields for non-pre-allocated reservations - // This ensures the observed state reflects the desired state from Spec - needsStatusUpdate := false - if res.Spec.TargetHost != "" && res.Status.Host != res.Spec.TargetHost { - res.Status.Host = res.Spec.TargetHost - needsStatusUpdate = true - } - if needsStatusUpdate { - old := res.DeepCopy() + // A reservation with an existing TargetHost is already placed; recover the + // observed status from spec instead of asking the scheduler again. + if res.Spec.TargetHost != "" { + old := res.DeepCopy() + res.Status.Host = res.Spec.TargetHost patch := client.MergeFrom(old) if err := r.Status().Patch(ctx, &res, patch); err != nil { // Ignore not-found errors during background deletion if client.IgnoreNotFound(err) != nil { logger.Error(err, "failed to sync spec to status") @@ return ctrl.Result{}, nil } logger.Info("synced spec to status", "host", res.Status.Host) + return ctrl.Result{}, nil }Also applies to: 243-299
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/commitments/controller.go` around lines 110 - 129, The reconcile currently syncs res.Spec.TargetHost to res.Status.Host but then continues and may call ScheduleReservation(), risking re-placement; after you successfully patch status when res.Spec.TargetHost is non-empty, short-circuit return (e.g., return ctrl.Result{}, nil) so we don't fall through to ScheduleReservation(). Update the block that checks res.Spec.TargetHost (in the reconcile logic around the MergeFrom patch) to return immediately after the successful Status().Patch and successful logger.Info, and apply same guard where similar spec->status recovery occurs (the other block around lines 243-299) to avoid re-scheduling already-placed reservations.
262-281:⚠️ Potential issue | 🟠 MajorRequeue the
NoHostsFoundstate.This branch marks the reservation
Ready=Falseand then goes idle. SinceSetupWithManager()only watchesReservationobjects, later capacity changes on hypervisors/knowledge won't trigger another reconcile, so this reservation can stay unscheduled indefinitely. Requeue withr.Conf.RequeueIntervalRetryhere, like theNoHostsAvailablepath.Suggested fix
- return ctrl.Result{}, nil // No need to requeue, we didn't find a host. + return ctrl.Result{RequeueAfter: r.Conf.RequeueIntervalRetry}, nilAlso applies to: 511-514
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/scheduling/reservations/commitments/controller.go` around lines 262 - 281, The NoHostsFound branch sets Reservation Ready=False but returns no requeue; update the end of that branch to requeue like the NoHostsAvailable path by returning ctrl.Result{RequeueAfter: r.Conf.RequeueIntervalRetry}, ensuring you still perform the status patch via r.Status().Patch(ctx, &res, patch) and handle client.IgnoreNotFound(err) as before; modify the return that currently does "return ctrl.Result{}, nil" to "return ctrl.Result{RequeueAfter: r.Conf.RequeueIntervalRetry}, nil" (apply the same change to the other identical block around lines 511-514).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@internal/scheduling/reservations/commitments/controller.go`:
- Around line 110-129: The reconcile currently syncs res.Spec.TargetHost to
res.Status.Host but then continues and may call ScheduleReservation(), risking
re-placement; after you successfully patch status when res.Spec.TargetHost is
non-empty, short-circuit return (e.g., return ctrl.Result{}, nil) so we don't
fall through to ScheduleReservation(). Update the block that checks
res.Spec.TargetHost (in the reconcile logic around the MergeFrom patch) to
return immediately after the successful Status().Patch and successful
logger.Info, and apply same guard where similar spec->status recovery occurs
(the other block around lines 243-299) to avoid re-scheduling already-placed
reservations.
- Around line 262-281: The NoHostsFound branch sets Reservation Ready=False but
returns no requeue; update the end of that branch to requeue like the
NoHostsAvailable path by returning ctrl.Result{RequeueAfter:
r.Conf.RequeueIntervalRetry}, ensuring you still perform the status patch via
r.Status().Patch(ctx, &res, patch) and handle client.IgnoreNotFound(err) as
before; modify the return that currently does "return ctrl.Result{}, nil" to
"return ctrl.Result{RequeueAfter: r.Conf.RequeueIntervalRetry}, nil" (apply the
same change to the other identical block around lines 511-514).
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: 197c9251-ddd6-427b-853c-7edcfd5ac252
📒 Files selected for processing (5)
helm/bundles/cortex-nova/values.yamlinternal/scheduling/reservations/commitments/config.gointernal/scheduling/reservations/commitments/controller.gointernal/scheduling/reservations/commitments/controller_test.gointernal/scheduling/reservations/commitments/state.go
🚧 Files skipped from review as they are similar to previous changes (2)
- helm/bundles/cortex-nova/values.yaml
- internal/scheduling/reservations/commitments/state.go
Test Coverage ReportTest Coverage 📊: 68.2% |
Changes