refactor: Path compression for DiscrTree#2577
refactor: Path compression for DiscrTree#2577nomeata wants to merge 11 commits intoleanprover:masterfrom
Conversation
|
|
Could someone !bench this? |
|
!bench |
|
Here are the benchmark results for commit 83e2599. |
That’s a bit disappointing. I’ll make mathlib compile, maybe there is something there. |
|
From 237MB to: That's at least pretty nice. |
|
Mild improvements across the board, it seems? |
|
awaiting-review Guess we have the data to weigh this idea. The code certainly gets more complex, but the size gains at least for Jovan Gerbscheid pointed out that due to the well-nested structure of key sequences, values are only ever found at the leaves. One could therefore remove the .empty constructor and the recursive argument to .values, and make insertion panic when the key sequence is invalid, for a bit code reduction. OTOH, the current code is a bit more robust, for example when people mess with the key sequences as in leanprover-community/mathlib4#7345, so maybe better like this for now. |
e1f47a4 to
3a9a19f
Compare
|
I updated the branches, and created a new comparison report: I’m not sure how to best get a good signal from the report. |
|
Oh, I think you're underselling. A 1% improvement on Mathlib is still great news, and the size improvements for the caches are a significant QOL win, because they are usually the last to finish when running |
|
@nomeata, I'm marking this back to |
|
Absolutely! Although I think I’ll see leanprover-community/batteries#285 (which I also have to update now) through first, so that refactorings like the present only affect one downstream repo, and not three. |
|
I’ll close this. Others are working on DiscrTree more actively these days, have a better sense if this is useful, and if it is, can probably cherry-pick the ideas easily. |
|
@nomeata, do you have interest in reviving this PR in some form? As mathlib and Lean continue to grow, the number of simp lemmas keeps growing, and building the simp discrimination tree (at import time) is taking a significant amount of time when importing all of mathlib. The refactor in this PR would be one way to mitigate this inefficiency. |
|
Do we have evidence that this PR is helping with import times? |
|
I measured that for And I recall that when this PR was benchmarked against mathlib there was a significant speedup. So I figured that this speedup is from the reduced import times, and that the effect would be even more significant now that mathlib has grown more. |
|
Improving startup time for |
|
An alternative approach would be to make use of a |
|
Closing this PR in favor of #12838 Some form of lazy loading may possibly also be interesting, of course. |
Trie data structures tend to be very wasteful if one doesn’t apply path compression: If you have a sequence of nodes of degree 1, better put them in a flat array, instead of nested
.nodeswith singleton arrays.If applied to the
DiscrTree, this would reduce the size of theDiscrTreeused by mathlib’slibrary_searchfrom217 MBto41 MB, and the maximum depth from2596to27. The maximum depths was causing stack overflow in mathlib PRs.