Skip to content

Conversation

@bfournie
Copy link
Contributor

Fixes multiple issues with the behaviour of the UI-driven installation for Agent OVE, i.e. ISO_NO_REGISTRY test.

  • Added a delay before starting the installation. Without this delay the ignition_config_overrides do not set since the IGNITION_CONFIG_OVERRIDE feature had not yet been read from InfraEnv
  • Handle progression from the Cluster screen to allow cluster creation to complete and Operators page to load
  • Fix Host Discovery page loading to proceed when Next button is available

In addition debug has been added to save the cluster and host status retrieved from the API along with additional screenshots.

Fixes multiple issues with the behaviour of the UI-driven installation
for Agent OVE, i.e. ISO_NO_REGISTRY test.

- Added a delay before starting the installation. Without this delay
  the ignition_config_overrides do not set since the IGNITION_CONFIG_OVERRIDE
  feature had not yet been read from InfraEnv
- Handle progression from the Cluster screen to allow cluster creation
  to complete and Operators page to load
- Fix Host Discovery page loading to proceed when Next button is
  available

In addition debug has been added to save the cluster and host status
retrieved from the API along with additional screenshots.
@openshift-ci
Copy link

openshift-ci bot commented Jan 26, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign rwsu for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@bfournie
Copy link
Contributor Author

/test e2e-agent-sno-ipv6

@openshift-ci
Copy link

openshift-ci bot commented Jan 26, 2026

@bfournie: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-agent-sno-ipv6 8ca2b7a link false /test e2e-agent-sno-ipv6
ci/prow/e2e-agent-ha-dualstack 8ca2b7a link false /test e2e-agent-ha-dualstack
ci/prow/e2e-agent-compact-ipv4-iso-no-registry 8ca2b7a link false /test e2e-agent-compact-ipv4-iso-no-registry

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.


// Wait 1 minute before entering cluster details to allow backend initialization
// This mimics manual usage where a user takes time to read and fill in the form
logrus.Info("Waiting 1 minute to allow backend initialization...")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Froma pure logging point of view, until a new LOGS_DIR env var will be set (inherited from the dev-scripts calling environment), and configured for the current logger, I don't think the logs will be visible anywhere
Note: the ds LOGS_DIR already points to a folder gathered at the end of the job and published as CI artifacts, so in that way it will be automatically taken (aka don't log any sensible info)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logOutputDir     = os.Getenv("LOGDIR")
...
func initLogger() {
	logFileName := "ui_driven_cluster_installation.log"
	logBaseDir := "."
	if logOutputDir != "" {
		logBaseDir = logOutputDir
	}

	file, err := os.OpenFile(filepath.Join(logBaseDir, logFileName), os.O_CREATE|os.O_WRONLY, 0644)
	if err != nil {
		panic(err)
	}
	logrus.SetOutput(io.MultiWriter(os.Stdout, file))
}

screenshotPath = filepath.Join(cwd, ocpDir)
}

// Wait 1 minute before entering cluster details to allow backend initialization
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the exact problem here? What is the exact backed initialization we're waiting for?
Is it go-rod too fast? Btw note that the SlowMotion could be also used to slow it down, ie:

browser := rod.New().
		ControlURL(url).
		SlowMotion(500 * time.Millisecond).
		MustConnect()


func main() {
logrus.Info("Launching headless browser...")
time.Sleep(1 * time.Minute)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 1 minute? As soon as the UI is available the user should be able to jump on it and start filling the details

// No error page, cluster loaded successfully
logrus.Infof("Operators page loaded successfully (attempt %d)", attempt)
break
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bfournie I think at this point this project may deserve some relevant refactoring to make it maintenable and simpler (also to change in case in future enh of the UI). I like the original @pawanpinjarkar ideas of having 1 method per page, and the main should really looks like something extremely simple (meta-code):

initLogger()
page := initBrowser()

fillAndSubmitClusterDetails(...)
fillAndSubmitVirtualizationBundle(...)
fillAndSubmitHostDiscovery(...)
fillAndSubmitVerifyStorage(...)
(and so on)

Every page submitting method should have its own logic to understand if an error happened or not, and then fill the required fields.
And the logic should be based only on the UI elements (as the user would do)

logrus.Info("Select virtualization bundle")

// Save API state for debugging
saveClusterAPIState(screenshotPath, "02-operators")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is required, and potentially overlapping with the agent-gather. We must have a single way of debugging issues, I think we already agreed on adding the SSH key to the ISO for letting the agent-gather to do its work


// Wait for operators page to finish loading before clicking Next
logrus.Info("Waiting for operators page to finish loading...")
time.Sleep(5 * time.Second)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this wait here seems not strictly required

ticker := time.NewTicker(30 * time.Second)
defer ticker.Stop()

go func() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should do at all this kind of actions here (polling the AS to infer the status). As the user would do, the expected approach is to wait until:

  1. An error is shown in a popup
  2. The next button becomes enabled

@bfournie
Copy link
Contributor Author

/hold
Marking as hold for now

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jan 27, 2026
@bfournie
Copy link
Contributor Author

Some of the screen progression issues seem to be resolved in 4.22 so closing this for now
/close

@openshift-ci openshift-ci bot closed this Jan 27, 2026
@openshift-ci
Copy link

openshift-ci bot commented Jan 27, 2026

@bfournie: Closed this PR.

Details

In response to this:

Some of the screen progression issues seem to be resolved in 4.22 so closing this for now
/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants