controller: implement cordon and drain logic#53
Merged
Conversation
The drain.Helper from k8s.io/kubectl/pkg/drain requires a kubernetes.Interface clientset rather than controller-runtime's client.Client. Add a KubeClient field to the reconciler and wire it up in cmd/controller/main.go and suite_test.go. Assisted-by: Pi (Claude Opus 4.6)
The drain.Helper needs to list pods on a node, evict them, and look up DaemonSets (for IgnoreAllDaemonSets). Add RBAC markers for pods (get/list), pods/eviction (create), and apps/daemonsets (get). Regenerate the ClusterRole manifest. Assisted-by: Pi (Claude Opus 4.6)
Collaborator
Author
|
Still want to test this manually as well locally, but I think it's ready for review. Tested this more now and yeah I think it's ready to go! |
62a1d04 to
d787c9c
Compare
For each candidate selected by selectCandidates, fetch the K8s Node and call assignRebootSlot. This function sets the in-reboot-slot and was-cordoned annotations on the BootcNode and cordons the K8s Node. All operations are idempotent so controller restarts re-enter the flow cleanly. Drains are not started yet; that is the next commit. While we're here, also add the matching `freeRebootSlot` function, even though it's not used yet. Assisted-by: Pi (Claude Opus 4.6)
drain.Helper uses io.Writer (Out and ErrOut) for its output rather than a structured logger. This adapter bridges the two so that drain progress and errors flow through our logr pipeline with appropriate levels instead of being lost or going to stdout/stderr. Assisted-by: Pi (Claude Opus 4.6)
After assigning reboot slots to candidates, iterate all Staged nodes with the in-reboot-slot annotation and start drain goroutines for them via ensureDrain. Drain completion is not handled yet; that is the next commit. Assisted-by: Pi (Claude Opus 4.6)
The Eviction API has been stable for some time now. This is something that wasn't available at the time the MCO was started for example. I think we should strongly consider using that directly instead of pulling in kubectl, which is a huge dependency. There's some gotchas there we could lift from the kubectl code. But nothing insurmountable. Anyway, just write this down as a future enhancement for now.
Add collectDrainResults, called at the start of driveRollout before assigning new reboot slots. It does a non-blocking check on each drain's result channel. On success, it sets desiredImageState to Booted on the BootcNode so the daemon knows to apply the staged image and reboot. Drain errors and cancellations are logged but otherwise deferred. Assisted-by: Pi (Claude Opus 4.6)
Simulates a 3-node rollout where all nodes have staged the target image. With maxUnavailable: 1, verifies that only one node gets a reboot slot (cordoned, annotated) at a time. This is pretty simple for now because we only have all the rollout logic partially implemented. As we add more of the missing pieces, we can keep extending this test to be more complete. Assisted-by: Pi (Claude Opus 4.6)
This is subtle but by having the tag be `:latest`, K8s will use a default of `Always` for the image pull policy: https://kubernetes.io/docs/concepts/containers/images/#imagepullpolicy-defaulting (We don't specify an explicit `imagePullPolicy` in our configs. We may do that in the future, though there's trade-offs there to think about. I think it depends a lot on the packaging/productization layers once we get there. But I think the K8s defaults mostly make sense.)
d787c9c to
1dc31e8
Compare
alicefr
approved these changes
Jun 8, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
More milestone 3b bits. See individual commit messages.