Skip to content

Conversation

@eliza-balas
Copy link
Contributor

PR Description

Add fclk0-3 nodes to properly initialize the PL reference clocks (pl0_ref through pl3_ref) before they are used by fabric peripherals.

The issue appeared when using the zynqmp-adrv9009-zu11eg-revb-adrv2crr-fmc-revb-sync-fmcomms8.dts
The AXI GPIO driver at axi_fmcomms8_gpio: gpio@86000000 references clock pl0_ref which is never prepared or enabled by the clock framework. When runtime PM suspends the GPIO device, xgpio_runtime_suspend calls clk_disable() on an unprepared clock, causing a kernel panic.

PR Type

  • [ x ] Bug fix (a change that fixes an issue)
  • New feature (a change that adds new functionality)
  • Breaking change (a change that affects other repos or cause CIs to fail)

PR Checklist

  • [ x ] I have conducted a self-review of my own code changes
  • [ x ] I have compiled my changes, including the documentation
  • [ x ] I have tested the changes on the relevant hardware
  • I have updated the documentation outside this repo accordingly
  • I have provided links for the relevant upstream lore

Add fclk0-3 nodes to properly initialize the PL reference clocks
(pl0_ref through pl3_ref) before they are used by fabric peripherals.

For example, if the AXI GPIO driver references a pl_ref clock which
was never prepared or enabled by the clock framework, when runtime PM
suspends the GPIO device, xgpio_runtime_suspend calls clk_disable()
on an unprepared clock, causing a kernel panic.

Signed-off-by: Eliza Balas <[email protected]>
@nunojsa
Copy link
Collaborator

nunojsa commented Dec 15, 2025

Add fclk0-3 nodes to properly initialize the PL reference clocks (pl0_ref through pl3_ref) before they are used by fabric peripherals.

The above is an hack...

The issue appeared when using the zynqmp-adrv9009-zu11eg-revb-adrv2crr-fmc-revb-sync-fmcomms8.dts
The AXI GPIO driver at axi_fmcomms8_gpio: gpio@86000000 references clock pl0_ref which is never prepared or enabled by the clock framework. When runtime PM suspends the GPIO device, xgpio_runtime_suspend calls clk_disable() on an unprepared clock, causing a kernel panic.

Hmm the above seems like a bug in the driver. What you're doing with the DT stuff is clearly an hack that relies on an external, staging driver to prepare the clock for you. That is obviously wrong... So I would recommend for you to dig a bit further on the issue root cause.

Having said the above I have some trouble understading the issue: Looking at the DT I saw the clock node defined and looking at the code:

https://github.com/analogdevicesinc/linux/blob/main/drivers/gpio/gpio-xilinx.c#L644

So I see the clock being requested and enabled. It seems to me the issue is more subtle.

@eliza-balas
Copy link
Contributor Author

Add fclk0-3 nodes to properly initialize the PL reference clocks (pl0_ref through pl3_ref) before they are used by fabric peripherals.

The above is an hack...

The issue appeared when using the zynqmp-adrv9009-zu11eg-revb-adrv2crr-fmc-revb-sync-fmcomms8.dts
The AXI GPIO driver at axi_fmcomms8_gpio: gpio@86000000 references clock pl0_ref which is never prepared or enabled by the clock framework. When runtime PM suspends the GPIO device, xgpio_runtime_suspend calls clk_disable() on an unprepared clock, causing a kernel panic.

Hmm the above seems like a bug in the driver. What you're doing with the DT stuff is clearly an hack that relies on an external, staging driver to prepare the clock for you. That is obviously wrong... So I would recommend for you to dig a bit further on the issue root cause.

Having said the above I have some trouble understading the issue: Looking at the DT I saw the clock node defined and looking at the code:

https://github.com/analogdevicesinc/linux/blob/main/drivers/gpio/gpio-xilinx.c#L644

So I see the clock being requested and enabled. It seems to me the issue is more subtle.

The issue occurs because the clock is being disabled by the GPIO driver, and somehow enters in a race condition when multiple drivers attempt to manage the same clock resource.

In the 2023.2 release, the device tree includes these nodes that prevents this problem:
https://github.com/analogdevicesinc/linux/blob/2023_r2_p1/arch/arm64/boot/dts/xilinx/zynqmp-clk-ccf.dtsi

That is the only difference compared to the latest kernel version: when using the same kernel version with the 2023.2 device tree, no kernel panic occurs, whereas with the latest device tree, the kernel panic happens.

@nunojsa
Copy link
Collaborator

nunojsa commented Dec 15, 2025

The issue occurs because the clock is being disabled by the GPIO driver, and somehow enters in a race condition when multiple drivers attempt to manage the same clock resource.

This is what needs to be better explained and looked up. The clock framework should be more than capable about handling multiple consumers enabling/disabling the same consumer. I'm not sure the problem is on the GPIO driver (even though that enabling/disabling in the PM callbacks is very prone to cause this type of things).

In the 2023.2 release, the device tree includes these nodes that prevents this problem:
https://github.com/analogdevicesinc/linux/blob/2023_r2_p1/arch/arm64/boot/dts/xilinx/zynqmp-clk-ccf.dtsi

Those went away because xilinx removed them...

That is the only difference compared to the latest kernel version: when using the same kernel version with the 2023.2 device tree, no kernel panic occurs, whereas with the latest device tree, the kernel panic happens.

Yes, I can see why it does not happen with those nodes. Yet, it is still a bandaid fix which means we're failing to identify the real root cause of the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants