-
Notifications
You must be signed in to change notification settings - Fork 83
CI Integration Example SpeedyWeather.jl #2784
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
CI Integration Example SpeedyWeather.jl #2784
Conversation
|
Your PR requires formatting changes to meet the project's style guidelines. Click here to view the suggested changes.diff --git a/test/integration/SpeedyWeather/runtests.jl b/test/integration/SpeedyWeather/runtests.jl
index ea3e354c..c606e9fd 100644
--- a/test/integration/SpeedyWeather/runtests.jl
+++ b/test/integration/SpeedyWeather/runtests.jl
@@ -6,7 +6,7 @@
using SpeedyWeather, Enzyme, Test
spectral_grid = SpectralGrid(trunc = 32, nlayers = 8) # define resolution
-model = PrimitiveWetModel(; spectral_grid, physics=false) # construct model
+model = PrimitiveWetModel(; spectral_grid, physics = false) # construct model
# physics = false to accelate the test
simulation = initialize!(model)
initialize!(simulation) |
|
Thanks Max! |
|
@giordano can we make the filtering more aggressive and only run CI on the integration test being changed? |
|
We could generate the matrix dynamically, but only if |
|
BTW, this is missing adding SpeedyWeather to Enzyme.jl/.github/workflows/Integration.yml Lines 47 to 56 in ab96c82
|
|
Oh, yes. I just added it. |
|
Besides the fact |
|
Yeah, in our own CI a very similar test takes about 35 minutes. I hoped it's a bit faster now after recent Enzyme patches, but it doesn't seem to be the case. If 20 min is the limit (didn't know that), then I can turn off some of the parametrizations and see if that's sufficient. I am out of time to really experiment with that for today though. |
|
Timeout is 45 minutes
|
|
I mean the gradient compile time is what it is currently 🤷 |
|
I updated the example to mirror more closely the one we already use in your CI in Speedy. It should be faster now, and there it always finishes < 40 min. However, it also seems that Enzymev0.13.105 broke something for us, as we are getting fails from our CI tests on this since Friday. So this example here will also currently fail. |
|
well thats certainly a reason for getting the integration tests in! |
|
that said the erring part is a bit too complex/baked in, would you be able to construct a MWE of? |
Co-authored-by: Mosè Giordano <[email protected]>
Co-authored-by: Mosè Giordano <[email protected]>
b0f98ad to
8eec083
Compare
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2784 +/- ##
==========================================
- Coverage 67.79% 67.78% -0.01%
==========================================
Files 58 58
Lines 20723 20726 +3
==========================================
Hits 14050 14050
- Misses 6673 6676 +3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
|
okay I think I fixed the custom rule issue immediately here (and a potential PR for more information on the 1.11: EnzymeAD/Enzyme#2582). However, on a debug julia I pretty much always hit GC errors. @vchuravy I think we really ought debug and fix that MWE speedweather GC error issue before we can land this |
|
Thanks for your effort on this already. I already had a debugging session today with this, but it's quite nasty to boil this down to a proper MWE. I know which intermediate function from our code causes the error, but each individual function it calls is fine. This is the function for our simplest model, and here's an example call with Enzyme, but in the next days I'll try to condense it to a proper MWE, it's just not that easy to boil this down so quickly. |
This comment was marked as outdated.
This comment was marked as outdated.
|
Is #2818 related to this here? Just let me know if I should invest a bit more time for finding a really proper MWE, or if that's already it. |
|
No that's unrelated and an issue from an in progress PR which doesn't occur on main atm |
|
The whole issue just confuses me. Now, I found that just leaving away the So in the following script, using some of SpeedyWeather's numerics I just present the just slightly different versions where the last one that includes an unused Julia 1.10, Enzyme v0.13.108, SpeedyWeather using Enzyme, SpeedyWeather
# this works
function transform_debug!(
vor_grid, u_grid, scratch_memory, U,
vor,
S
)
# catch incorrect sizes early
@boundscheck SpeedyWeather.ismatching(S, vor_grid) || throw(DimensionMismatch(S, vor_grid))
@boundscheck SpeedyWeather.ismatching(S, vor) || throw(DimensionMismatch(S, vor))
g_north = scratch_memory.north
g_south = scratch_memory.south
# this is a FFT, for which we have a custom rule defined
SpeedyWeather.SpeedyTransforms._fourier!(vor_grid, g_north, g_south, S)
# catch incorrect sizes early
@boundscheck SpeedyWeather.ismatching(S, u_grid) || throw(DimensionMismatch(S, u_grid))
@boundscheck SpeedyWeather.ismatching(S, U) || throw(DimensionMismatch(S, U))
g_north = scratch_memory.north
g_south = scratch_memory.south
SpeedyWeather.SpeedyTransforms._fourier!(u_grid, g_north, g_south, S)
return nothing
end
# this works
function transform_debug_with_kwargs_no_checks!(
vor_grid, u_grid, scratch_memory, U,
vor,
S;
kwargs... # now has an unused kwargs...
)
# but no @boundschecks
g_north = scratch_memory.north
g_south = scratch_memory.south
# this is a FFT, for which we have a custom rule defined
SpeedyWeather.SpeedyTransforms._fourier!(vor_grid, g_north, g_south, S)
g_north = scratch_memory.north
g_south = scratch_memory.south
SpeedyWeather.SpeedyTransforms._fourier!(u_grid, g_north, g_south, S)
return nothing
end
# this gives a segfault
function transform_debug_with_kwargs!(
vor_grid, u_grid, scratch_memory, U,
vor,
S;
kwargs... # still has an unused kwargs...
)
# boundschecks and kwarg together cause an error here now
@boundscheck SpeedyWeather.ismatching(S, vor_grid) || throw(DimensionMismatch(S, vor_grid))
@boundscheck SpeedyWeather.ismatching(S, vor) || throw(DimensionMismatch(S, vor))
# that's a spherical harmonics transforms (KA kernel + FFT)
g_north = scratch_memory.north
g_south = scratch_memory.south
# this is a FFT for which we have a custom rule defined
SpeedyWeather.SpeedyTransforms._fourier!(vor_grid, g_north, g_south, S)
@boundscheck SpeedyWeather.ismatching(S, u_grid) || throw(DimensionMismatch(S, u_grid))
@boundscheck SpeedyWeather.ismatching(S, U) || throw(DimensionMismatch(S, U))
g_north = scratch_memory.north
g_south = scratch_memory.south
SpeedyWeather.SpeedyTransforms._fourier!(u_grid, g_north, g_south, S)
return nothing
end
spectral_grid = SpectralGrid(trunc=9, nlayers=1)
S = SpectralTransform(spectral_grid)
#simulation = initialize!(model)
# these are RingGrids.Fields
vor_grid = rand(spectral_grid.grid, 1)
u_grid = rand(spectral_grid.grid, 1)
v_grid = rand(spectral_grid.grid, 1)
scratch_memory = deepcopy(S.scratch_memory)
# these are LowerTriangularArrays.LowerTriangularArray
vor = rand(ComplexF32, spectral_grid.spectrum, 1)
U = rand(ComplexF32, spectral_grid.spectrum, 1)
# this works
autodiff(Reverse, transform_debug!, Const, Duplicated(vor_grid, make_zero(vor_grid)), Duplicated(u_grid, make_zero(u_grid)), Duplicated(scratch_memory, make_zero(scratch_memory)), Duplicated(U, make_zero(U)), Duplicated(vor, make_zero(vor)), Const(S))
# this works as well
autodiff(Reverse, transform_debug_with_kwargs_no_checks!, Const, Duplicated(vor_grid, make_zero(vor_grid)), Duplicated(u_grid, make_zero(u_grid)), Duplicated(scratch_memory, make_zero(scratch_memory)), Duplicated(U, make_zero(U)), Duplicated(vor, make_zero(vor)), Const(S))
# this gives a SegFault
autodiff(Reverse, transform_debug_with_kwargs!, Const, Duplicated(vor_grid, make_zero(vor_grid)), Duplicated(u_grid, make_zero(u_grid)), Duplicated(scratch_memory, make_zero(scratch_memory)), Duplicated(U, make_zero(U)), Duplicated(vor, make_zero(vor)), Const(S)) |
|
I've updated the example above and I am honestly just really confused by it. This was fine before, but now seemingly some interaction between our custom rules for the FFT, a The custom rule is actually implemented here |
@vchuravy suggested I add an example using SpeedyWeather.jl to the integration tests.
The test is pretty much the same thing we do in the paper. It's a sensitivity analysis of a single grid point and it checks that this runs without errors and the gradient makes physical sense, so the gradient is localised around the selected grid point and small far away from it.