Skip to content

OCPBUGS-84114: Add hypershift-openstack-gather step#80419

Open
stephenfin wants to merge 2 commits into
openshift:mainfrom
shiftstack:fix-hypershift-gate
Open

OCPBUGS-84114: Add hypershift-openstack-gather step#80419
stephenfin wants to merge 2 commits into
openshift:mainfrom
shiftstack:fix-hypershift-gate

Conversation

@stephenfin

@stephenfin stephenfin commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Add and enable a new hypershift-openstack-gather step to the OpenStack HCP jobs. This allows us to gather information about cluster post-delete, which can help us prove some theories regarding

While here, document the RHCOS_IMAGE_NAME option and clear its default. This confused me. We might want to drop testing of 4.18 (and subsequently this option) since it's dev preview but that's left to another PR.

Summary by CodeRabbit

This PR updates OpenShift CI configuration for the hypershift OpenStack Hosted Control Plane (HCP) workflows to add a short, best-effort diagnostic gather step and to clarify how an OpenStack RHCOS image name is passed to E2E tests.

Practical impact

  • A new optional post-test step, hypershift-openstack-gather, is added to the hypershift OpenStack e2e job workflow. It runs after tests (best-effort, optional on success, 5m timeout) and snapshots CAPO (OpenStackCluster/OpenStackMachine/OpenStackServer), CAPI (clusters/machines), ORC (images.openstack.k-orc.cloud), hostedclusters, nodepools and namespaces stuck in Terminating to ARTIFACT_DIR/capo-gather for debugging deletions and finalizer-related failures.
  • The e2e execution command now conditionally passes --e2e.openstack-node-image-name only when RHCOS_IMAGE_NAME is explicitly set, avoiding an empty/incorrect default being passed to tests.
  • RHCOS_IMAGE_NAME’s default was cleared (now empty) and documentation was added describing when the variable is used (notably differences between 4.18 and 4.19+ behavior).
  • Governance metadata and OWNERS were added for the new step to authorize openstack-approvers/reviewers.

Scope and affected CI config

  • Changes are confined to hypershift OpenStack CI config and step-registry:
    • ci-operator/config/openshift/hypershift/openshift-hypershift-main.yaml — integrates the gather step into the e2e workflow post section.
    • ci-operator/step-registry/hypershift/openstack/gather/* — new gather command script, step ref YAML, metadata JSON, and OWNERS.
    • ci-operator/step-registry/hypershift/openstack/e2e/execute/hypershift-openstack-e2e-execute-commands.sh — conditional CLI flag passing.
    • ci-operator/step-registry/hypershift/openstack/e2e/execute/hypershift-openstack-e2e-execute-ref.yaml — RHCOS_IMAGE_NAME default cleared and documented.

Why this matters

  • The gather step collects targeted artifacts after test failures to aid triage of deletion/finalizer problems in OpenStack HCP jobs.
  • Removing the misleading default and passing the RHCOS image flag only when set reduces accidental misconfiguration and aligns behavior across OpenStack versions.

Notes

  • The PR leaves a comment that testing of 4.18 and the RHCOS_IMAGE_NAME option may be removed in a future change; that removal is not part of this PR.

RHCOS_IMAGE_NAME fed --e2e.openstack-node-image-name, which was wired
into DefaultOpenStackOptions() on 4.18 and earlier. Since e12e00159
(4.19+) ORC manages the image lifecycle directly for TestCreateCluster
and TestNodePool, so the flag is no longer consumed by those tests.
TestOpenStackAdvancedTest still honours it but is not in the default
E2E_TESTS_REGEX.

Clear the misleading "rhcos-latest-hcp-nodepool" default to "" and only
pass the flag when the variable is non-empty. The 4.18 job configs that
explicitly set RHCOS_IMAGE_NAME to a versioned image name are unaffected.

Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
@openshift-ci-robot openshift-ci-robot added jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. labels Jun 11, 2026
@openshift-ci-robot

Copy link
Copy Markdown
Contributor

@stephenfin: This pull request references Jira Issue OCPBUGS-84114, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state ASSIGNED, which is one of the valid states (NEW, ASSIGNED, POST)

The bug has been updated to refer to the pull request using the external bug tracker.

Details

In response to this:

Add and enable a new hypershift-openstack-gather step to the OpenStack HCP jobs. This allows us to gather information about cluster post-delete, which can help us prove some theories regarding

While here, document the RHCOS_IMAGE_NAME option and clear its default. This confused me. We might want to drop testing of 4.18 (and subsequently this option) since it's dev preview but that's left to another PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci Bot requested review from enxebre and mandre June 11, 2026 14:19
@openshift-ci

openshift-ci Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: stephenfin
Once this PR has been reviewed and has the lgtm label, please assign enxebre for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: be82ebd0-ed60-4a21-be7d-edc0f8385f9f

📥 Commits

Reviewing files that changed from the base of the PR and between 859a7b1 and d7ca9ea.

📒 Files selected for processing (5)
  • ci-operator/config/openshift/hypershift/openshift-hypershift-main.yaml
  • ci-operator/step-registry/hypershift/openstack/gather/OWNERS
  • ci-operator/step-registry/hypershift/openstack/gather/hypershift-openstack-gather-commands.sh
  • ci-operator/step-registry/hypershift/openstack/gather/hypershift-openstack-gather-ref.metadata.json
  • ci-operator/step-registry/hypershift/openstack/gather/hypershift-openstack-gather-ref.yaml
🚧 Files skipped from review as they are similar to previous changes (5)
  • ci-operator/step-registry/hypershift/openstack/gather/OWNERS
  • ci-operator/config/openshift/hypershift/openshift-hypershift-main.yaml
  • ci-operator/step-registry/hypershift/openstack/gather/hypershift-openstack-gather-ref.metadata.json
  • ci-operator/step-registry/hypershift/openstack/gather/hypershift-openstack-gather-ref.yaml
  • ci-operator/step-registry/hypershift/openstack/gather/hypershift-openstack-gather-commands.sh

Walkthrough

PR adds a new OpenStack forensic gather step to HyperShift CI, makes RHCOS_IMAGE_NAME optional for E2E runs, and wires the gather step into the e2e-openstack-aws job's post-test workflow before destroy.

Changes

OpenStack E2E Testing Enhancement

Layer / File(s) Summary
E2E test parameter flexibility
ci-operator/step-registry/hypershift/openstack/e2e/execute/hypershift-openstack-e2e-execute-ref.yaml, ci-operator/step-registry/hypershift/openstack/e2e/execute/hypershift-openstack-e2e-execute-commands.sh
RHCOS_IMAGE_NAME default changed to "" with documentation; E2E command now includes --e2e.openstack-node-image-name only when RHCOS_IMAGE_NAME is non-empty.
New OpenStack gather step (implementation)
ci-operator/step-registry/hypershift/openstack/gather/hypershift-openstack-gather-commands.sh
New script validates KUBECONFIG, creates ${ARTIFACT_DIR}/capo-gather, and gathers Terminating namespaces, hostedclusters/nodepools, CAPI clusters/machines, CAPO openstack resources, and ORC images into deterministic YAML files.
New OpenStack gather step (registry & metadata)
ci-operator/step-registry/hypershift/openstack/gather/hypershift-openstack-gather-ref.yaml, ci-operator/step-registry/hypershift/openstack/gather/hypershift-openstack-gather-ref.metadata.json, ci-operator/step-registry/hypershift/openstack/gather/OWNERS
Adds a hypershift-openstack-gather step-ref (best_effort, optional_on_success, resource requests, timeout) and metadata linking openstack-approvers/openstack-reviewers, plus OWNERS.
Job workflow integration
ci-operator/config/openshift/hypershift/openshift-hypershift-main.yaml
Adds a post block to e2e-openstack-aws to run hypershift-openstack-gather (ref) and then chain hypershift-destroy-nested-management-cluster.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

Suggested labels

lgtm, approved, rehearsals-ack

🚥 Pre-merge checks | ✅ 15
✅ Passed checks (15 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'OCPBUGS-84114: Add hypershift-openstack-gather step' directly and clearly summarizes the main change—adding a new hypershift-openstack-gather step to the CI configuration.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed This PR contains no Go test files or Ginkgo test declarations. It modifies CI/CD configuration (YAML), Bash scripts, and metadata files only. The check for stable test names is not applicable.
Test Structure And Quality ✅ Passed PR contains only CI/CD configuration, shell scripts, and metadata files—no Ginkgo test code to review. Custom check is not applicable.
Microshift Test Compatibility ✅ Passed No Ginkgo e2e tests are added in this PR. The PR modifies only CI/CD configuration files, Bash scripts, and metadata—not Go test code. The check is inapplicable.
Single Node Openshift (Sno) Test Compatibility ✅ Passed This PR does not add new Ginkgo e2e tests. All changes are CI configuration, shell scripts for test execution/gathering, and governance files. The SNO compatibility check only applies to new Ginkgo...
Topology-Aware Scheduling Compatibility ✅ Passed PR modifies only CI/test infrastructure (job configs, step registry definitions, test scripts, governance metadata), not deployment manifests, operator code, or controllers. No scheduling constrain...
Ote Binary Stdout Contract ✅ Passed PR contains only YAML config and bash scripts; no OTE Go binaries or Go code changes present, making this check not applicable.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed No new Ginkgo e2e tests are added in this PR. All changes are CI configuration (YAML), shell scripts, and metadata files. The check is not applicable.
No-Weak-Crypto ✅ Passed No weak cryptographic algorithms, custom crypto implementations, or non-constant-time secret comparisons detected. PR modifies only CI infrastructure configuration and test gathering scripts.
Container-Privileges ✅ Passed No container privilege escalation configurations (privileged, hostPID, hostNetwork, hostIPC, SYS_ADMIN, allowPrivilegeEscalation, runAsRoot) found in any modified YAML or script files.
No-Sensitive-Data-In-Logs ✅ Passed The gather script logs Kubernetes resource metadata (OpenStackCluster, OpenStackMachine, etc.) via oc get -o yaml, which references secrets by name but does not expose actual credential values; thi...

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci-robot

Copy link
Copy Markdown
Contributor

@stephenfin: This pull request references Jira Issue OCPBUGS-84114, which is valid.

3 validation(s) were run on this bug
  • bug is open, matching expected state (open)
  • bug target version (5.0.0) matches configured target version for branch (5.0.0)
  • bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)
Details

In response to this:

Add and enable a new hypershift-openstack-gather step to the OpenStack HCP jobs. This allows us to gather information about cluster post-delete, which can help us prove some theories regarding

While here, document the RHCOS_IMAGE_NAME option and clear its default. This confused me. We might want to drop testing of 4.18 (and subsequently this option) since it's dev preview but that's left to another PR.

Summary by CodeRabbit

This PR enhances the OpenStack HCP (Hosted Control Plane) CI infrastructure by adding a post-test diagnostic step and improving configuration clarity.

Key Changes:

  1. New hypershift-openstack-gather step: Adds a new diagnostic step to OpenStack HCP e2e jobs that runs after test completion to collect and export debugging information from the management cluster. This step gathers CAPO (Cluster API Provider OpenStack), ORC (OpenStack Resource Controller), and related Kubernetes resources to help investigate cluster state and identify objects blocked by finalizers. The step is marked as best-effort and optional on success, with a 5-minute timeout.

  2. RHCOS_IMAGE_NAME configuration improvements:

  • Clears the misleading default value of RHCOS_IMAGE_NAME from "rhcos-latest-hcp-nodepool" to an empty string
  • Makes the CLI argument conditionally passed to the e2e test only when this variable is explicitly set
  • Adds documentation explaining when this variable applies (OpenStack versions and which e2e tests consume it)
  • This resolves confusion caused by the default value being inappropriate for newer Kubernetes versions where the flag behavior changed
  1. Job workflow updates: Integrates the new gather step into the e2e-openstack-aws test pipeline, ensuring diagnostic collection happens automatically after test execution.

The changes are scoped to OpenStack HCP job configuration and include proper ownership/reviewer metadata for governance.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@ci-operator/step-registry/hypershift/openstack/gather/hypershift-openstack-gather-commands.sh`:
- Around line 5-8: The file check using KUBECONFIG can fail under set -u when
the variable is unset; update the guard to use safe parameter expansion and
check for emptiness first, e.g. replace the if [[ ! -f "${KUBECONFIG}" ]] test
with a combined safe check using "${KUBECONFIG:-}" (for example: if [[ -z
"${KUBECONFIG:-}" || ! -f "${KUBECONFIG}" ]]; then ... ) so the script won’t
error on unbound KUBECONFIG and will still skip the gather path; adjust the
echo/exit behavior in the same block that references KUBECONFIG.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 7cd6d984-fbcc-4896-bb08-d6d5aff37f69

📥 Commits

Reviewing files that changed from the base of the PR and between 51b034f and 859a7b1.

📒 Files selected for processing (7)
  • ci-operator/config/openshift/hypershift/openshift-hypershift-main.yaml
  • ci-operator/step-registry/hypershift/openstack/e2e/execute/hypershift-openstack-e2e-execute-commands.sh
  • ci-operator/step-registry/hypershift/openstack/e2e/execute/hypershift-openstack-e2e-execute-ref.yaml
  • ci-operator/step-registry/hypershift/openstack/gather/OWNERS
  • ci-operator/step-registry/hypershift/openstack/gather/hypershift-openstack-gather-commands.sh
  • ci-operator/step-registry/hypershift/openstack/gather/hypershift-openstack-gather-ref.metadata.json
  • ci-operator/step-registry/hypershift/openstack/gather/hypershift-openstack-gather-ref.yaml

This is vaguely based on the hypershift-analyze-e2e-failure step but
much simpler (no AI integration).

Signed-off-by: Stephen Finucane <stephenfin@redhat.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
@stephenfin stephenfin force-pushed the fix-hypershift-gate branch from 859a7b1 to d7ca9ea Compare June 11, 2026 14:34
@openshift-merge-bot

Copy link
Copy Markdown
Contributor

[REHEARSALNOTIFIER]
@stephenfin: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
pull-ci-openshift-openstack-resource-controller-main-e2e-hypershift openshift/openstack-resource-controller presubmit Registry content changed
pull-ci-openshift-openstack-resource-controller-release-5.1-e2e-hypershift openshift/openstack-resource-controller presubmit Registry content changed
pull-ci-openshift-openstack-resource-controller-release-5.0-e2e-hypershift openshift/openstack-resource-controller presubmit Registry content changed
pull-ci-openshift-openstack-resource-controller-release-4.23-e2e-hypershift openshift/openstack-resource-controller presubmit Registry content changed
pull-ci-openshift-openstack-resource-controller-release-4.22-e2e-hypershift openshift/openstack-resource-controller presubmit Registry content changed
pull-ci-openshift-openstack-resource-controller-release-4.21-e2e-hypershift openshift/openstack-resource-controller presubmit Registry content changed
pull-ci-openshift-openstack-resource-controller-release-4.20-e2e-hypershift openshift/openstack-resource-controller presubmit Registry content changed
pull-ci-openshift-openstack-resource-controller-release-4.19-e2e-hypershift openshift/openstack-resource-controller presubmit Registry content changed
pull-ci-openshift-machine-config-operator-main-e2e-openstack-hypershift openshift/machine-config-operator presubmit Registry content changed
pull-ci-openshift-machine-config-operator-release-5.1-e2e-openstack-hypershift openshift/machine-config-operator presubmit Registry content changed
pull-ci-openshift-machine-config-operator-release-5.0-e2e-openstack-hypershift openshift/machine-config-operator presubmit Registry content changed
pull-ci-openshift-machine-config-operator-release-4.23-e2e-openstack-hypershift openshift/machine-config-operator presubmit Registry content changed
pull-ci-openshift-machine-config-operator-release-4.22-e2e-openstack-hypershift openshift/machine-config-operator presubmit Registry content changed
pull-ci-openshift-machine-config-operator-release-4.21-e2e-openstack-hypershift openshift/machine-config-operator presubmit Registry content changed
pull-ci-openshift-machine-config-operator-release-4.20-e2e-openstack-hypershift openshift/machine-config-operator presubmit Registry content changed
pull-ci-openshift-machine-config-operator-release-4.19-e2e-openstack-hypershift openshift/machine-config-operator presubmit Registry content changed
pull-ci-openshift-machine-config-operator-release-4.18-e2e-openstack-hypershift openshift/machine-config-operator presubmit Registry content changed
pull-ci-openshift-hypershift-main-e2e-openstack-aws openshift/hypershift presubmit Ci-operator config changed
pull-ci-openshift-hypershift-release-5.1-e2e-openstack-aws openshift/hypershift presubmit Registry content changed
pull-ci-openshift-hypershift-release-5.0-e2e-openstack-aws openshift/hypershift presubmit Registry content changed
pull-ci-openshift-hypershift-release-4.23-e2e-openstack-aws openshift/hypershift presubmit Registry content changed
pull-ci-openshift-hypershift-release-4.22-e2e-openstack-aws openshift/hypershift presubmit Registry content changed
pull-ci-openshift-hypershift-release-4.21-e2e-openstack-aws openshift/hypershift presubmit Registry content changed
pull-ci-openshift-hypershift-release-4.20-e2e-openstack-aws openshift/hypershift presubmit Registry content changed
pull-ci-openshift-hypershift-release-4.19-e2e-openstack-aws openshift/hypershift presubmit Registry content changed

A total of 42 jobs have been affected by this change. The above listing is non-exhaustive and limited to 25 jobs.

A full list of affected jobs can be found here

Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@openshift-ci

openshift-ci Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

@stephenfin: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/step-registry-metadata d7ca9ea link true /test step-registry-metadata

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/severity-moderate Referenced Jira bug's severity is moderate for the branch this PR is targeting. jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants