Skip to content

CNTRLPLANE-3205: Execute performance testing for self-managed Azure HCP#80449

Open
mehabhalodiya wants to merge 1 commit into
openshift:mainfrom
mehabhalodiya:azure_perf
Open

CNTRLPLANE-3205: Execute performance testing for self-managed Azure HCP#80449
mehabhalodiya wants to merge 1 commit into
openshift:mainfrom
mehabhalodiya:azure_perf

Conversation

@mehabhalodiya

@mehabhalodiya mehabhalodiya commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Implements https://redhat.atlassian.net/browse/CNTRLPLANE-3205

Changes:

  • Add performance test step registry components in hypershift/azure/performance-test/
  • Add periodic CI configuration (runs Mondays at 8am UTC)
  • Generate Prow job for periodic execution
  • Add comprehensive documentation with baselines and troubleshooting

Performance scenarios tested:

  • HostedCluster creation time (target <30min, baseline ~20min)
  • NodePool scale up 2→10 nodes (target <10min, baseline ~8min)
  • NodePool scale down 10→2 nodes (target <5min, baseline ~4min)
  • API server availability (target 100%)
  • HostedCluster deletion time (target <15min, baseline ~11min)

Metrics are collected in both human-readable (metrics.txt) and
machine-readable (metrics.json) formats for analysis and comparison
with other HyperShift platforms (AWS, KubeVirt, bare metal).

Test artifacts include:

  • Performance measurement script with cluster lifecycle operations
  • Step registry ref and workflow definitions
  • Periodic job configuration for weekly execution
  • README with usage, baselines, and troubleshooting guide

Related tickets:

Summary by CodeRabbit

This PR adds a new, weekly OpenShift CI performance pipeline in the openshift/release repository to run lifecycle performance tests for self-managed Azure HyperShift Control Plane (HCP).

What this changes in practice

  • CI scheduling: introduces a new periodic Prow job (azure-self-managed-performance) configured under ci-operator/config/openshift/hypershift as release-5.0 periodics; it runs weekly on Mondays at 08:00 UTC using the hypershift-azure cluster profile and environment flags to target self-managed Azure.
  • Prow/ci-operator wiring: adds step-registry entries and metadata under ci-operator/step-registry/hypershift/azure/performance-test that register a new workflow (hypershift-azure-performance-test) and a single test step (hypershift-azure-performance-test) with required mounts for Azure credentials and required env inputs (OCP_IMAGE_LATEST, HYPERSHIFT_BINARY). Owners/metadata JSON files are included for step and workflow ownership.
  • Test runner: provides a Bash test runner script that automates a full HostedCluster lifecycle on Azure:
    • creates a HostedCluster and waits for Available
    • samples guest API availability and records availability percentage
    • scales a NodePool up (2→10) and down (10→2), waiting for readiness
    • deletes the HostedCluster and waits for cleanup
    • measures and records operation durations
    • emits human-readable (metrics.txt) and machine-readable (metrics.json) artifacts for CI analytics and cross-platform comparison
  • Documentation: adds a README describing scenarios, targets, baselines, execution, required credentials, troubleshooting, and expected artifacts.

Key performance scenarios, targets and baselines

  • HostedCluster creation: target < 30 min (baseline ~20 min)
  • NodePool scale up 2→10: target < 10 min (baseline ~8 min)
  • NodePool scale down 10→2: target < 5 min (baseline ~4 min)
  • API server availability: target 100% during sampling window
  • HostedCluster deletion: target < 15 min (baseline ~11 min)
  • Outputs: metrics.txt and metrics.json saved as test artifacts

Impact summary

  • Enables automated, periodic performance regression detection for Azure self-managed HyperShift within the OpenShift CI ecosystem.
  • Reuses existing pre/post steps (nested management, private-link, hypershift install/destroy) to minimize duplication while wiring in Azure-specific credentials and env expectations.
  • Adds ownership metadata so the new step/workflow and the periodic job are discoverable and maintainable in CI.

@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jun 11, 2026
@openshift-ci-robot

openshift-ci-robot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

@mehabhalodiya: This pull request references CNTRLPLANE-3205 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "5.0.0" version, but no target version was set.

Details

In response to this:

Implements https://redhat.atlassian.net/browse/CNTRLPLANE-3205

Changes:

  • Add performance test step registry components in hypershift/azure/performance-test/
  • Add periodic CI configuration (runs Mondays at 8am UTC)
  • Generate Prow job for periodic execution
  • Add comprehensive documentation with baselines and troubleshooting

Performance scenarios tested:

  • HostedCluster creation time (target <30min, baseline ~20min)
  • NodePool scale up 2→10 nodes (target <10min, baseline ~8min)
  • NodePool scale down 10→2 nodes (target <5min, baseline ~4min)
  • API server availability (target 100%)
  • HostedCluster deletion time (target <15min, baseline ~11min)

Metrics are collected in both human-readable (metrics.txt) and
machine-readable (metrics.json) formats for analysis and comparison
with other HyperShift platforms (AWS, KubeVirt, bare metal).

Test artifacts include:

  • Performance measurement script with cluster lifecycle operations
  • Step registry ref and workflow definitions
  • Periodic job configuration for weekly execution
  • README with usage, baselines, and troubleshooting guide

Related tickets:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Walkthrough

Adds a complete Azure self‑managed HyperShift performance testing stack: a Bash runner that collects timing/availability metrics, a step registry/ref, a pre/test/post workflow, weekly release periodic, README, and OWNERS metadata.

Changes

Azure HyperShift Performance Testing Infrastructure

Layer / File(s) Summary
Step implementation and step-ref
ci-operator/step-registry/hypershift/azure/performance-test/hypershift-azure-performance-test-commands.sh, ci-operator/step-registry/hypershift/azure/performance-test/hypershift-azure-performance-test-ref.yaml, ci-operator/step-registry/hypershift/azure/performance-test/hypershift-azure-performance-test-ref.metadata.json
Bash script executes five ordered performance scenarios (create hosted cluster, check guest API availability, scale nodepool up/down, delete cluster) while collecting timing and availability metrics in metrics.txt and metrics.json. Step registry YAML configures a 90‑minute execution with three Azure credential mounts, 100m CPU/300Mi memory, env vars (region, base domain, node counts, release image behavior, OIDC URL), and required inputs OCP_IMAGE_LATEST and HYPERSHIFT_BINARY. Metadata JSON lists approvers/reviewers.
Workflow orchestration
ci-operator/step-registry/hypershift/azure/performance-test/hypershift-azure-performance-test-workflow.yaml, ci-operator/step-registry/hypershift/azure/performance-test/hypershift-azure-performance-test-workflow.metadata.json
Workflow defines hypershift-azure-performance-test with pre, test, and post phases referencing nested-management, private-link, HyperShift install, the performance test step, and teardown. Metadata JSON specifies ownership.
Periodic test schedule
ci-operator/config/openshift/hypershift/openshift-hypershift-release-5.0__periodics-azure-perf.yaml
Release config for HyperShift 5.0 adds azure-self-managed-performance periodic (weekly Mon 08:00 UTC), base image refs, OCP 5.0 candidate tracking, wildcard default resource requests (100m CPU, 200Mi memory), and generated metadata.
Documentation and ownership
ci-operator/step-registry/hypershift/azure/performance-test/README.md, ci-operator/step-registry/hypershift/azure/performance-test/OWNERS
README documents test goals, scenarios, metrics/targets, CI workflow and manual steps, required infra/credentials, artifacts (metrics.txt, metrics.json), baselines/comparisons, and troubleshooting. OWNERS is a symlink to parent directory.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Suggested labels

lgtm, rehearsals-ack

Suggested reviewers

  • bryan-cox
  • sjenning
🚥 Pre-merge checks | ✅ 15
✅ Passed checks (15 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main objective of the PR: adding performance testing infrastructure for self-managed Azure HyperShift Control Plane.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Stable And Deterministic Test Names ✅ Passed PR adds only YAML/SH/MD under ci-operator/step-registry/.../performance-test (no .go), and repo contains no Go files importing ginkgo, so no Ginkgo test titles to flag.
Test Structure And Quality ✅ Passed PR #80449’s changed files are only YAML/JSON/MD/SH (no *_test.go); searches find no Ginkgo (Describe/It/BeforeEach/AfterEach) code, so the test-structure checklist isn’t applicable.
Microshift Test Compatibility ✅ Passed PR #80449 only adds YAML/JSON/MD/Shell performance-test assets; no new Ginkgo (Go) e2e tests were added, so MicroShift API-compatibility check isn’t applicable.
Single Node Openshift (Sno) Test Compatibility ✅ Passed Checked modified files in this PR’s HEAD commit: no Go/_test.go additions contain ginkgo or ginkgo-style Describe/It/Context/When patterns, so no new Ginkgo e2e tests to evaluate for SNO.
Topology-Aware Scheduling Compatibility ✅ Passed Reviewed added periodic YAML + azure performance-test step/workflow/scripts; no affinity/anti-affinity, topologySpreadConstraints, nodeSelector/PodDisruptionBudget found.
Ote Binary Stdout Contract ✅ Passed PR #80449 changes only ci-operator YAML/JSON/MD/SH/OWNERS files (no Go/OTE binary code), so no OTE stdout contract violations can be introduced here.
Ipv6 And Disconnected Network Test Compatibility ✅ Passed Searched the PR’s listed new/changed files and the repo’s Go sources for Ginkgo markers (Describe/It/Context/When/ginkgo); none found, so no IPv6/disconnected assumptions to flag.
No-Weak-Crypto ✅ Passed PASS: Added files contain no MD5/SHA1/DES/RC4/3DES/Blowfish/ECB usage; only sha256sum hashes PROW_JOB_ID in the script, with no custom crypto or token/secret comparisons.
Container-Privileges ✅ Passed Checked the PR’s added periodic YAML and step-registry ref/workflow plus commands.sh for privileged/hostPID/hostNetwork/hostIPC/SYS_ADMIN/allowPrivilegeEscalation/securityContext/runAsUser; none pr...
No-Sensitive-Data-In-Logs ✅ Passed Checked the new Azure performance test script/config: no set -x/env/printenv, kubeconfig/credential/token values aren’t echoed; oc outputs are redirected to /dev/null. Only cluster name/region/dura...

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

@openshift-ci

openshift-ci Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: mehabhalodiya
Once this PR has been reviewed and has the lgtm label, please assign sjenning for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot requested review from bryan-cox and mgencur June 11, 2026 22:57

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
ci-operator/step-registry/hypershift/azure/performance-test/hypershift-azure-performance-test-commands.sh (1)

34-34: ⚡ Quick win

Declare and assign separately to avoid masking return values.

Lines 34 and 39 combine local declaration with command substitution. If date fails, the error is masked by local's return code. While date rarely fails, separating declaration and assignment improves error visibility.

♻️ Proposed fix
 time_operation() {
   local operation_name=$1
-  local start_time=$(date +%s)
+  local start_time
+  start_time=$(date +%s)

   shift
   "$@"

-  local end_time=$(date +%s)
+  local end_time
+  end_time=$(date +%s)
   local duration=$((end_time - start_time))

Also applies to: 39-39

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/hypershift/azure/performance-test/hypershift-azure-performance-test-commands.sh`
at line 34, Separate declaration from assignment for the timestamp variables to
avoid masking command failures: replace the combined statements that use command
substitution with two steps — first declare the local variables (e.g., local
start_time and local end_time) and then assign them using date (e.g.,
start_time=$(date +%s) and end_time=$(date +%s)). Update the occurrences of
start_time and end_time in hypershift-azure-performance-test-commands.sh so each
uses a standalone local declaration followed by the command substitution
assignment.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@ci-operator/step-registry/hypershift/azure/performance-test/hypershift-azure-performance-test-commands.sh`:
- Line 51: Replace hardcoded /hypershift/bin/hypershift invocations with the
HYPERSHIFT_BINARY variable: add an export fallback like export
HYPERSHIFT_BINARY="${HYPERSHIFT_BINARY:-/hypershift/bin/hypershift}" near the
top of the script and update every invocation (the create cluster azure command
and the destroy and kubeconfig calls that currently use
/hypershift/bin/hypershift) to call "${HYPERSHIFT_BINARY}" instead so the step
ref dependency is respected.
- Line 144: The metrics.json file is initialized as a JSON array ("[]") but
time_operation and check_api_availability append individual JSON objects,
producing invalid JSON; to fix, remove the echo "[]" >
"${PERF_RESULTS_DIR}/metrics.json" initialization and treat metrics.json as
NDJSON (one JSON object per line) by ensuring functions time_operation and
check_api_availability append newline-delimited JSON to
"${PERF_RESULTS_DIR}/metrics.json" and update any script comments/docs to state
metrics.json is NDJSON; alternatively (if you prefer a true JSON array) refactor
time_operation and check_api_availability to collect objects into an in-memory
array and write the complete array to "${PERF_RESULTS_DIR}/metrics.json" at the
end, but the preferred change is to remove the array init and document NDJSON
usage.
- Around line 123-125: The guest kubeconfig is being written to
"${PERF_RESULTS_DIR}/guest-kubeconfig" (under ARTIFACT_DIR) which exposes
cluster-admin credentials; change the redirection target in the `hypershift
create kubeconfig` invocation so the file is written to a non-artifact location
such as "${SHARED_DIR}/guest-kubeconfig" (or a secure temporary file created
with mktemp) instead of PERF_RESULTS_DIR, and ensure any downstream references
(variables or consumers) use the new SHARED_DIR path.

---

Nitpick comments:
In
`@ci-operator/step-registry/hypershift/azure/performance-test/hypershift-azure-performance-test-commands.sh`:
- Line 34: Separate declaration from assignment for the timestamp variables to
avoid masking command failures: replace the combined statements that use command
substitution with two steps — first declare the local variables (e.g., local
start_time and local end_time) and then assign them using date (e.g.,
start_time=$(date +%s) and end_time=$(date +%s)). Update the occurrences of
start_time and end_time in hypershift-azure-performance-test-commands.sh so each
uses a standalone local declaration followed by the command substitution
assignment.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 8bc58dfa-51d5-4e30-86c4-ab0e87b01f32

📥 Commits

Reviewing files that changed from the base of the PR and between d6dc7a7 and 1b4cdcb.

⛔ Files ignored due to path filters (1)
  • ci-operator/jobs/openshift/hypershift/openshift-hypershift-release-5.0-periodics.yaml is excluded by !ci-operator/jobs/**
📒 Files selected for processing (8)
  • ci-operator/config/openshift/hypershift/openshift-hypershift-release-5.0__periodics-azure-perf.yaml
  • ci-operator/step-registry/hypershift/azure/performance-test/OWNERS
  • ci-operator/step-registry/hypershift/azure/performance-test/README.md
  • ci-operator/step-registry/hypershift/azure/performance-test/hypershift-azure-performance-test-commands.sh
  • ci-operator/step-registry/hypershift/azure/performance-test/hypershift-azure-performance-test-ref.metadata.json
  • ci-operator/step-registry/hypershift/azure/performance-test/hypershift-azure-performance-test-ref.yaml
  • ci-operator/step-registry/hypershift/azure/performance-test/hypershift-azure-performance-test-workflow.metadata.json
  • ci-operator/step-registry/hypershift/azure/performance-test/hypershift-azure-performance-test-workflow.yaml

create_hosted_cluster() {
echo "Creating HostedCluster: ${PERF_CLUSTER_NAME}"

/hypershift/bin/hypershift create cluster azure \

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Check hypershift binary usage patterns in similar Azure test steps

# Search for hypershift binary invocations in Azure hypershift step scripts
rg -n '/hypershift/bin/hypershift|HYPERSHIFT_BINARY' \
  -g 'ci-operator/step-registry/hypershift/azure/**/*-commands.sh' \
  -C 2

Repository: openshift/release

Length of output: 3906


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) Locate and print the relevant ref.yaml section for the specific step
fd -t f 'hypershift-azure-performance-test-ref.yaml' ci-operator/step-registry/hypershift/azure/performance-test -a || true
REF_FILE="$(fd -t f 'hypershift-azure-performance-test-ref.yaml' ci-operator/step-registry/hypershift/azure/performance-test -a | head -n1 || true)"
if [[ -n "${REF_FILE}" ]]; then
  echo "== $REF_FILE =="
  # Print around any HYPERSHIFT_BINARY mentions
  rg -n 'HYPERSHIFT_BINARY' "$REF_FILE" -n || true
  rg -n 'dependencies|dependency|env|HYPERSHIFT_BINARY' "$REF_FILE" -C 2 || true
  # Also show a small window around the first match (if any)
  match_line="$(rg -n 'HYPERSHIFT_BINARY' "$REF_FILE" | head -n1 | cut -d: -f1 || true)"
  if [[ -n "${match_line}" ]]; then
    start=$((match_line-5)); end=$((match_line+5))
    sed -n "${start},${end}p" "$REF_FILE"
  fi
fi

# 2) See how HYPERSHIFT_BINARY is used across hypershift azure scripts
echo "== Azure hypershift scripts: HYPERSHIFT_BINARY occurrences =="
rg -n 'HYPERSHIFT_BINARY' ci-operator/step-registry/hypershift/azure -S -g '*-commands.sh' -C 2 || true

# 3) Confirm whether performance-test script ever uses $HYPERSHIFT_BINARY
echo "== performance-test commands: $HYPERSHIFT_BINARY usage =="
rg -n '\$?\{?HYPERSHIFT_BINARY\}?' ci-operator/step-registry/hypershift/azure/performance-test/hypershift-azure-performance-test-commands.sh -S || true

Repository: openshift/release

Length of output: 2233


Use HYPERSHIFT_BINARY in hypershift-azure-performance-test (avoid hardcoded /hypershift/bin/hypershift).

The step ref declares HYPERSHIFT_BINARY as a dependency (ci-operator/step-registry/hypershift/azure/performance-test/hypershift-azure-performance-test-ref.yaml, lines 49-50), but ci-operator/step-registry/hypershift/azure/performance-test/hypershift-azure-performance-test-commands.sh hardcodes /hypershift/bin/hypershift for create/destroy/kubeconfig (lines 51, 102, 123). Align this script to invoke ${HYPERSHIFT_BINARY} (other Azure hypershift steps wire the variable via export HYPERSHIFT_BINARY="${HYPERSHIFT_BINARY:-/hypershift/bin/hypershift}").

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/hypershift/azure/performance-test/hypershift-azure-performance-test-commands.sh`
at line 51, Replace hardcoded /hypershift/bin/hypershift invocations with the
HYPERSHIFT_BINARY variable: add an export fallback like export
HYPERSHIFT_BINARY="${HYPERSHIFT_BINARY:-/hypershift/bin/hypershift}" near the
top of the script and update every invocation (the create cluster azure command
and the destroy and kubeconfig calls that currently use
/hypershift/bin/hypershift) to call "${HYPERSHIFT_BINARY}" instead so the step
ref dependency is respected.

Comment on lines +123 to +125
/hypershift/bin/hypershift create kubeconfig \
--name="${PERF_CLUSTER_NAME}" \
--namespace="${PERF_NAMESPACE}" > "${PERF_RESULTS_DIR}/guest-kubeconfig"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical | ⚡ Quick win

Critical: Guest kubeconfig saved to public artifacts directory.

The guest kubeconfig contains cluster-admin credentials and is written to ${PERF_RESULTS_DIR}/guest-kubeconfig, which is under ${ARTIFACT_DIR}. Artifact directories are uploaded to publicly accessible Prow job artifacts, exposing sensitive cluster credentials.

Save the kubeconfig to ${SHARED_DIR} or a temporary directory outside ${ARTIFACT_DIR} to prevent credential leakage.

🔒 Proposed fix to prevent credential exposure
  # Get kubeconfig for the hosted cluster
  /hypershift/bin/hypershift create kubeconfig \
    --name="${PERF_CLUSTER_NAME}" \
-    --namespace="${PERF_NAMESPACE}" > "${PERF_RESULTS_DIR}/guest-kubeconfig"
+    --namespace="${PERF_NAMESPACE}" > "${SHARED_DIR}/guest-kubeconfig"

  local api_checks=0
  local api_successes=0

  for i in {1..10}; do
    api_checks=$((api_checks + 1))
-    if KUBECONFIG="${PERF_RESULTS_DIR}/guest-kubeconfig" oc get nodes &>/dev/null; then
+    if KUBECONFIG="${SHARED_DIR}/guest-kubeconfig" oc get nodes &>/dev/null; then
      api_successes=$((api_successes + 1))
    fi
    sleep 2
  done
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
/hypershift/bin/hypershift create kubeconfig \
--name="${PERF_CLUSTER_NAME}" \
--namespace="${PERF_NAMESPACE}" > "${PERF_RESULTS_DIR}/guest-kubeconfig"
/hypershift/bin/hypershift create kubeconfig \
--name="${PERF_CLUSTER_NAME}" \
--namespace="${PERF_NAMESPACE}" > "${SHARED_DIR}/guest-kubeconfig"
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/hypershift/azure/performance-test/hypershift-azure-performance-test-commands.sh`
around lines 123 - 125, The guest kubeconfig is being written to
"${PERF_RESULTS_DIR}/guest-kubeconfig" (under ARTIFACT_DIR) which exposes
cluster-admin credentials; change the redirection target in the `hypershift
create kubeconfig` invocation so the file is written to a non-artifact location
such as "${SHARED_DIR}/guest-kubeconfig" (or a secure temporary file created
with mktemp) instead of PERF_RESULTS_DIR, and ensure any downstream references
(variables or consumers) use the new SHARED_DIR path.

Source: Coding guidelines

}

# Initialize JSON metrics file
echo "[]" > "${PERF_RESULTS_DIR}/metrics.json"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Invalid JSON format: metrics.json initialized as array but appended as objects.

Line 144 initializes metrics.json with "[]", but the time_operation function (line 42) and check_api_availability (line 140) append individual JSON objects. This creates invalid JSON:

[]
{"metric":"...","value":...,"timestamp":...}
{"metric":"...","value":...,"timestamp":...}

Choose one format:

  • Option 1 (recommended): Remove the [] initialization for newline-delimited JSON (NDJSON).
  • Option 2: Build a proper JSON array by collecting metrics in memory and writing once at the end.
📝 Proposed fix for Option 1 (NDJSON format)
-# Initialize JSON metrics file
-echo "[]" > "${PERF_RESULTS_DIR}/metrics.json"
+# Initialize metrics files
+: > "${PERF_RESULTS_DIR}/metrics.json"  # Empty file for NDJSON format
 echo "# Azure Self-Managed HCP Performance Metrics" > "${PERF_RESULTS_DIR}/metrics.txt"

Then update the script summary documentation to note the file is NDJSON (one JSON object per line).

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/hypershift/azure/performance-test/hypershift-azure-performance-test-commands.sh`
at line 144, The metrics.json file is initialized as a JSON array ("[]") but
time_operation and check_api_availability append individual JSON objects,
producing invalid JSON; to fix, remove the echo "[]" >
"${PERF_RESULTS_DIR}/metrics.json" initialization and treat metrics.json as
NDJSON (one JSON object per line) by ensuring functions time_operation and
check_api_availability append newline-delimited JSON to
"${PERF_RESULTS_DIR}/metrics.json" and update any script comments/docs to state
metrics.json is NDJSON; alternatively (if you prefer a true JSON array) refactor
time_operation and check_api_availability to collect objects into an in-memory
array and write the complete array to "${PERF_RESULTS_DIR}/metrics.json" at the
end, but the preferred change is to remove the array init and document NDJSON
usage.

@mehabhalodiya

Copy link
Copy Markdown
Contributor Author

/pj-rehearse periodic-ci-openshift-hypershift-release-5.0-periodics-azure-perf-azure-self-managed-performance

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

@mehabhalodiya: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

Add performance test step registry components in hypershift/azure/performance-test/

Signed-off-by: mehabhalodiya <mehabhalodiya@gmail.com>
@openshift-merge-bot

Copy link
Copy Markdown
Contributor

[REHEARSALNOTIFIER]
@mehabhalodiya: the pj-rehearse plugin accommodates running rehearsal tests for the changes in this PR. Expand 'Interacting with pj-rehearse' for usage details. The following rehearsable tests have been affected by this change:

Test name Repo Type Reason
periodic-ci-openshift-hypershift-release-5.0-periodics-azure-perf-azure-self-managed-performance N/A periodic Periodic changed
Interacting with pj-rehearse

Comment: /pj-rehearse to run up to 5 rehearsals
Comment: /pj-rehearse skip to opt-out of rehearsals
Comment: /pj-rehearse {test-name}, with each test separated by a space, to run one or more specific rehearsals
Comment: /pj-rehearse more to run up to 10 rehearsals
Comment: /pj-rehearse max to run up to 25 rehearsals
Comment: /pj-rehearse auto-ack to run up to 5 rehearsals, and add the rehearsals-ack label on success
Comment: /pj-rehearse list to get an up-to-date list of affected jobs
Comment: /pj-rehearse abort to abort all active rehearsals
Comment: /pj-rehearse network-access-allowed to allow rehearsals of tests that have the restrict_network_access field set to false. This must be executed by an openshift org member who is not the PR author

Once you are satisfied with the results of the rehearsals, comment: /pj-rehearse ack to unblock merge. When the rehearsals-ack label is present on your PR, merge will no longer be blocked by rehearsals.
If you would like the rehearsals-ack label removed, comment: /pj-rehearse reject to re-block merging.

@mehabhalodiya

Copy link
Copy Markdown
Contributor Author

/pj-rehearse periodic-ci-openshift-hypershift-release-5.0-periodics-azure-perf-azure-self-managed-performance

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

@mehabhalodiya: now processing your pj-rehearse request. Please allow up to 10 minutes for jobs to trigger or cancel.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In
`@ci-operator/step-registry/hypershift/azure/performance-test/hypershift-azure-performance-test-commands.sh`:
- Around line 77-88: The function wait_nodepool_ready currently checks
nodepool.status.replicas; change it to check the NodePool's ready replicas
instead by using the jsonpath '{.status.readyReplicas}' in the oc get call
inside wait_nodepool_ready (refer to function name wait_nodepool_ready and
variables PERF_CLUSTER_NAME and PERF_NAMESPACE). Update the until condition and
the echoed "Current replicas" line to use .status.readyReplicas and compare it
to ${expected_replicas} so the loop only exits when the NodePool actually
reports the expected number of ready replicas.

In `@ci-operator/step-registry/hypershift/azure/performance-test/README.md`:
- Around line 95-109: The README's manual steps conflict with the script's
kubeconfig handling: the script expects
"${SHARED_DIR}/management_cluster_kubeconfig" but the docs only export
KUBECONFIG=/path/to/management-cluster-kubeconfig; update the docs to either (a)
copy the provided kubeconfig into SHARED_DIR as management_cluster_kubeconfig
before running the script, or (b) change the example to export
KUBECONFIG="${SHARED_DIR}/management_cluster_kubeconfig" after setting
SHARED_DIR; reference the script name
hypershift-azure-performance-test-commands.sh and the variables SHARED_DIR,
KUBECONFIG, and management_cluster_kubeconfig when making the change.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 798cfb96-a10f-4cca-a060-c1552f3d9fdf

📥 Commits

Reviewing files that changed from the base of the PR and between aa5a563 and d5cce22.

⛔ Files ignored due to path filters (1)
  • ci-operator/jobs/openshift/hypershift/openshift-hypershift-release-5.0-periodics.yaml is excluded by !ci-operator/jobs/**
📒 Files selected for processing (8)
  • ci-operator/config/openshift/hypershift/openshift-hypershift-release-5.0__periodics-azure-perf.yaml
  • ci-operator/step-registry/hypershift/azure/performance-test/OWNERS
  • ci-operator/step-registry/hypershift/azure/performance-test/README.md
  • ci-operator/step-registry/hypershift/azure/performance-test/hypershift-azure-performance-test-commands.sh
  • ci-operator/step-registry/hypershift/azure/performance-test/hypershift-azure-performance-test-ref.metadata.json
  • ci-operator/step-registry/hypershift/azure/performance-test/hypershift-azure-performance-test-ref.yaml
  • ci-operator/step-registry/hypershift/azure/performance-test/hypershift-azure-performance-test-workflow.metadata.json
  • ci-operator/step-registry/hypershift/azure/performance-test/hypershift-azure-performance-test-workflow.yaml
✅ Files skipped from review due to trivial changes (3)
  • ci-operator/step-registry/hypershift/azure/performance-test/hypershift-azure-performance-test-ref.metadata.json
  • ci-operator/step-registry/hypershift/azure/performance-test/hypershift-azure-performance-test-workflow.metadata.json
  • ci-operator/step-registry/hypershift/azure/performance-test/OWNERS
🚧 Files skipped from review as they are similar to previous changes (3)
  • ci-operator/step-registry/hypershift/azure/performance-test/hypershift-azure-performance-test-workflow.yaml
  • ci-operator/config/openshift/hypershift/openshift-hypershift-release-5.0__periodics-azure-perf.yaml
  • ci-operator/step-registry/hypershift/azure/performance-test/hypershift-azure-performance-test-ref.yaml

Comment on lines +77 to +88
wait_nodepool_ready() {
local expected_replicas=$1
echo "Waiting for NodePool to have ${expected_replicas} ready replicas..."

timeout 20m bash -c "
until [[ \$(oc get nodepool ${PERF_CLUSTER_NAME} -n ${PERF_NAMESPACE} -o jsonpath='{.status.replicas}') == '${expected_replicas}' ]]; do
echo 'Current replicas: '\$(oc get nodepool ${PERF_CLUSTER_NAME} -n ${PERF_NAMESPACE} -o jsonpath='{.status.replicas}')
sleep 10
done
"

echo "NodePool has ${expected_replicas} ready replicas"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Wait for ready replicas, not just .status.replicas.

Line 82 currently gates on nodepool.status.replicas, but the benchmark text and log message say this step measures when the NodePool is ready. status.replicas can reach the target before all nodes are actually ready, so the scale-up/scale-down durations can be underreported.

🔧 Proposed fix
 wait_nodepool_ready() {
   local expected_replicas=$1
   echo "Waiting for NodePool to have ${expected_replicas} ready replicas..."
 
   timeout 20m bash -c "
-    until [[ \$(oc get nodepool ${PERF_CLUSTER_NAME} -n ${PERF_NAMESPACE} -o jsonpath='{.status.replicas}') == '${expected_replicas}' ]]; do
-      echo 'Current replicas: '\$(oc get nodepool ${PERF_CLUSTER_NAME} -n ${PERF_NAMESPACE} -o jsonpath='{.status.replicas}')
+    until [[ \$(oc get nodepool ${PERF_CLUSTER_NAME} -n ${PERF_NAMESPACE} -o jsonpath='{.status.readyReplicas}') == '${expected_replicas}' ]]; do
+      echo 'Current ready replicas: '\$(oc get nodepool ${PERF_CLUSTER_NAME} -n ${PERF_NAMESPACE} -o jsonpath='{.status.readyReplicas}')
       sleep 10
     done
   "
 
   echo "NodePool has ${expected_replicas} ready replicas"
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@ci-operator/step-registry/hypershift/azure/performance-test/hypershift-azure-performance-test-commands.sh`
around lines 77 - 88, The function wait_nodepool_ready currently checks
nodepool.status.replicas; change it to check the NodePool's ready replicas
instead by using the jsonpath '{.status.readyReplicas}' in the oc get call
inside wait_nodepool_ready (refer to function name wait_nodepool_ready and
variables PERF_CLUSTER_NAME and PERF_NAMESPACE). Update the until condition and
the echoed "Current replicas" line to use .status.readyReplicas and compare it
to ${expected_replicas} so the loop only exits when the NodePool actually
reports the expected number of ready replicas.

Comment on lines +95 to +109
### Manual Execution
To run the performance test manually in a development environment:

1. Set up a management cluster with HyperShift operator installed
2. Ensure Azure credentials are configured
3. Export required environment variables
4. Run the test script:
```bash
export KUBECONFIG=/path/to/management-cluster-kubeconfig
export SHARED_DIR=/tmp/test-artifacts
export ARTIFACT_DIR=/tmp/test-results
export PROW_JOB_ID=test-$(date +%s)

# Run the performance test
./hypershift-azure-performance-test-commands.sh

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Manual execution steps don’t match the script’s kubeconfig handling.

The example tells readers to export KUBECONFIG=/path/to/management-cluster-kubeconfig, but the script immediately overwrites KUBECONFIG with ${SHARED_DIR}/management_cluster_kubeconfig. Following these docs as written will fail unless the kubeconfig is also copied into SHARED_DIR under that exact filename.

📝 Proposed doc adjustment
  1. Set up a management cluster with HyperShift operator installed
  2. Ensure Azure credentials are configured
- 3. Export required environment variables
+ 3. Copy the management cluster kubeconfig into `${SHARED_DIR}/management_cluster_kubeconfig`
+ 4. Export required environment variables
     4. Run the test script:
        ```bash
-       export KUBECONFIG=/path/to/management-cluster-kubeconfig
        export SHARED_DIR=/tmp/test-artifacts
        export ARTIFACT_DIR=/tmp/test-results
        export PROW_JOB_ID=test-$(date +%s)
+       cp /path/to/management-cluster-kubeconfig "${SHARED_DIR}/management_cluster_kubeconfig"
        
        # Run the performance test
        ./hypershift-azure-performance-test-commands.sh
        ```
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
### Manual Execution
To run the performance test manually in a development environment:
1. Set up a management cluster with HyperShift operator installed
2. Ensure Azure credentials are configured
3. Export required environment variables
4. Run the test script:
```bash
export KUBECONFIG=/path/to/management-cluster-kubeconfig
export SHARED_DIR=/tmp/test-artifacts
export ARTIFACT_DIR=/tmp/test-results
export PROW_JOB_ID=test-$(date +%s)
# Run the performance test
./hypershift-azure-performance-test-commands.sh
### Manual Execution
To run the performance test manually in a development environment:
1. Set up a management cluster with HyperShift operator installed
2. Ensure Azure credentials are configured
3. Copy the management cluster kubeconfig into `${SHARED_DIR}/management_cluster_kubeconfig`
4. Export required environment variables
5. Run the test script:
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@ci-operator/step-registry/hypershift/azure/performance-test/README.md` around
lines 95 - 109, The README's manual steps conflict with the script's kubeconfig
handling: the script expects "${SHARED_DIR}/management_cluster_kubeconfig" but
the docs only export KUBECONFIG=/path/to/management-cluster-kubeconfig; update
the docs to either (a) copy the provided kubeconfig into SHARED_DIR as
management_cluster_kubeconfig before running the script, or (b) change the
example to export KUBECONFIG="${SHARED_DIR}/management_cluster_kubeconfig" after
setting SHARED_DIR; reference the script name
hypershift-azure-performance-test-commands.sh and the variables SHARED_DIR,
KUBECONFIG, and management_cluster_kubeconfig when making the change.

@openshift-ci

openshift-ci Bot commented Jun 12, 2026

Copy link
Copy Markdown
Contributor

@mehabhalodiya: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/rehearse/periodic-ci-openshift-hypershift-release-5.0-periodics-azure-perf-azure-self-managed-performance d5cce22 link unknown /pj-rehearse periodic-ci-openshift-hypershift-release-5.0-periodics-azure-perf-azure-self-managed-performance

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants