refactor: Path compression for DiscrTree by nomeata · Pull Request #2577 · leanprover/lean4

nomeata · 2023-09-24T18:08:18Z

Trie data structures tend to be very wasteful if one doesn’t apply path compression: If you have a sequence of nodes of degree 1, better put them in a flat array, instead of nested .nodes with singleton arrays.

If applied to the DiscrTree, this would reduce the size of the DiscrTree used by mathlib’s library_search from 217 MB to 41 MB, and the maximum depth from 2596 to 27. The maximum depths was causing stack overflow in mathlib PRs.

ghost · 2023-09-24T21:01:28Z

💥 Mathlib branch lean-pr-testing-2577 build failed against this PR. (2023-09-24 21:01:27) View Log
💥 Mathlib branch lean-pr-testing-2577 build failed against this PR. (2023-09-24 21:20:48) View Log
💥 Mathlib branch lean-pr-testing-2577 build failed against this PR. (2023-09-25 07:23:33) View Log
💥 Mathlib branch lean-pr-testing-2577 build failed against this PR. (2023-09-25 08:51:18) View Log
💥 Mathlib branch lean-pr-testing-2577 build failed against this PR. (2023-09-25 08:59:29) View Log
🟡 Mathlib branch lean-pr-testing-2577 build against this PR was cancelled. (2023-09-25 09:37:59) View Log
💥 Mathlib branch lean-pr-testing-2577 build failed against this PR. (2023-09-25 09:40:14) View Log
✅ Mathlib branch lean-pr-testing-2577 has successfully built against this PR. (2023-09-25 11:13:24) View Log
🟡 Mathlib branch lean-pr-testing-2577 build this PR didn't complete normally. (2023-10-14 14:25:15) View Log
🟡 Mathlib branch lean-pr-testing-2577 build this PR didn't complete normally. (2023-10-16 21:26:44) View Log
💥 Mathlib branch lean-pr-testing-2577 build failed against this PR. (2023-10-16 22:14:22) View Log
💥 Mathlib branch lean-pr-testing-2577 build failed against this PR. (2023-10-17 21:15:05) View Log
🟡 Mathlib branch lean-pr-testing-2577 build this PR didn't complete normally. (2023-10-18 01:41:24) View Log
💥 Mathlib branch lean-pr-testing-2577 build failed against this PR. (2023-10-18 01:47:35) View Log
✅ Mathlib branch lean-pr-testing-2577 has successfully built against this PR. (2023-10-18 18:53:27) View Log

nomeata · 2023-09-25T07:10:58Z

Could someone !bench this?

Kha · 2023-09-25T07:43:44Z

!bench

leanprover-bot · 2023-09-25T08:00:43Z

Here are the benchmark results for commit 83e2599.
There were no significant changes against commit 2ac782c.

this goes along leanprover/lean4#2577

nomeata · 2023-09-25T08:38:19Z

There were no significant changes against commit 2ac782c.

That’s a bit disappointing. I’ll make mathlib compile, maybe there is something there.

this goes along leanprover/lean4#2577

nomeata · 2023-09-25T10:55:36Z

From 237MB to:

$ ls -sh ./build/lib/MathlibExtras/LibrarySearch.extra
58M ./build/lib/MathlibExtras/LibrarySearch.extra

That's at least pretty nice.

nomeata · 2023-09-25T17:44:13Z

Results are in: http://speed.lean-fro.org/mathlib4/compare/e7b27246-a3e6-496a-b552-ff4b45c7236e/to/ee9e4caf-7a54-4dc1-a6d9-19101bc7d2d9?hash1=a477df946a0648231b66e5a7f0f04839546ebf25

Mild improvements across the board, it seems?

nomeata · 2023-09-25T21:15:15Z

awaiting-review

Guess we have the data to weigh this idea. The code certainly gets more complex, but the size gains at least for library_search (didn't check the size of other in instances used) are significant, and a 3.5% mathlib compile time improvement is at least nice.

Jovan Gerbscheid pointed out that due to the well-nested structure of key sequences, values are only ever found at the leaves. One could therefore remove the .empty constructor and the recursive argument to .values, and make insertion panic when the key sequence is invalid, for a bit code reduction. OTOH, the current code is a bit more robust, for example when people mess with the key sequences as in leanprover-community/mathlib4#7345, so maybe better like this for now.

…v4.2.0-rc2)

…press-discrtree

nomeata · 2023-10-18T20:45:02Z

I updated the branches, and created a new comparison report:
http://speed.lean-fro.org/mathlib4/compare/63a60bc6-4997-4dbe-a363-9ceff22def1c/to/1c8d5e96-3fad-4da2-b00c-4b1a63cdc75b

I’m not sure how to best get a good signal from the report.
This time build wall-clock has improved by 0.9% (probably noise). Instruction counts go down through the bank, but less than 1%. All not very exiciting, not sure if this is worth pursuing, it seems only the size improvement to the library_search and rw? cache still look good.

kim-em · 2023-10-19T01:22:31Z

Oh, I think you're underselling. A 1% improvement on Mathlib is still great news, and the size improvements for the caches are a significant QOL win, because they are usually the last to finish when running lake exe cache get.

kim-em · 2023-11-02T01:54:23Z

@nomeata, I'm marking this back to awaiting-author now that there are conflicts.

nomeata · 2023-11-02T14:34:30Z

Absolutely! Although I think I’ll see leanprover-community/batteries#285 (which I also have to update now) through first, so that refactorings like the present only affect one downstream repo, and not three.

nomeata · 2023-12-02T11:28:16Z

I’ll close this. Others are working on DiscrTree more actively these days, have a better sense if this is useful, and if it is, can probably cherry-pick the ideas easily.

JovanGerb · 2026-03-07T12:39:33Z

@nomeata, do you have interest in reviving this PR in some form? As mathlib and Lean continue to grow, the number of simp lemmas keeps growing, and building the simp discrimination tree (at import time) is taking a significant amount of time when importing all of mathlib. The refactor in this PR would be one way to mitigate this inefficiency.

nomeata · 2026-03-07T12:44:35Z

Do we have evidence that this PR is helping with import times?

JovanGerb · 2026-03-07T12:59:30Z

I measured that for import Mathlib, the simp discrimination tree takes about 0.5s and the instances discrimination tree takes about 0.1s. When building mathlib, the imports are typically smaller, so it will take less time per file, but with over 8000 files this will add up. (see #mathlib4 > Performance cost of environment extensions @ 💬)

And I recall that when this PR was benchmarked against mathlib there was a significant speedup. So I figured that this speedup is from the reduced import times, and that the effect would be even more significant now that mathlib has grown more.

nomeata · 2026-03-07T13:32:04Z

Improving startup time for import Mathlib would be nice. I’ll ask claude to update this PR and we can get some measurement :-)

JovanGerb · 2026-03-07T17:48:28Z

An alternative approach would be to make use of a Thunk data structure so as to never compute the entire discrimination tree. However when I tried this, I realized that the implementation would require some more flexibility from the environment extension framework: at import time we would have to first construct the root node of the discrimination tree, and after that is created, map over it with some Thunk.mk. Otherwise we would need to use a Thunk.map for each imported simp lemma, which I somehow imagine to not be efficient.

nomeata · 2026-03-07T18:16:56Z

Closing this PR in favor of #12838

Some form of lazy loading may possibly also be interesting, of course.

nomeata added 8 commits September 24, 2023 20:07

refactor: Path compression for DiscrTree

0813dcc

Set prefer_native := true

33d972c

Typo in comment

8db6890

More nits

b726f84

Possible fix

6cd6142

Better luck now

2824fb5

More possible fixes

9b4bcb5

First check if we are at the end of the path

83e2599

github-actions bot added the toolchain-available A toolchain is available for this PR, at leanprover/lean4-pr-releases:pr-release-NNNN label Sep 24, 2023

ghost pushed a commit to leanprover-community/mathlib4 that referenced this pull request Sep 24, 2023

Update lean-toolchain for testing leanprover/lean4#2577

a919caa

ghost added the breaks-mathlib This is not necessarily a blocker for merging: but there needs to be a plan label Sep 24, 2023

nomeata added a commit to leanprover-community/mathlib4 that referenced this pull request Sep 24, 2023

Adjust Trie.mapArraysM to leanprover/lean4#2577

cec40ae

Make commonPrefix public

3a9a19f

nomeata mentioned this pull request Sep 25, 2023

refactor: Adjust to new DiscrTree implementation leanprover-community/batteries#273

Closed

nomeata added a commit to nomeata/batteries that referenced this pull request Sep 25, 2023

refactor: Adjust to new DiscrTree implementation

c1c51f2

this goes along leanprover/lean4#2577

nomeata mentioned this pull request Sep 25, 2023

refactor: Adjust to new DiscrTree implementation leanprover-community/aesop#68

Closed

nomeata added a commit to nomeata/aesop that referenced this pull request Sep 25, 2023

refactor: Adjust to new DiscrTree implementation

ec2ae45

this goes along leanprover/lean4#2577

ghost pushed a commit to leanprover-community/mathlib4 that referenced this pull request Sep 25, 2023

Trigger CI for leanprover/lean4#2577

50beb26

ghost added builds-mathlib CI has verified that Mathlib builds against this PR and removed breaks-mathlib This is not necessarily a blocker for merging: but there needs to be a plan labels Sep 25, 2023

nomeata mentioned this pull request Sep 25, 2023

refactor: Adjust to new DiscrTree implementation leanprover-community/mathlib4#7363

Closed

nomeata marked this pull request as ready for review September 25, 2023 21:09

nomeata mentioned this pull request Oct 13, 2023

feat: Upstream more DiscrTree APIs leanprover-community/batteries#285

Closed

nomeata force-pushed the joachim/compress-discrtree branch from e1f47a4 to 3a9a19f Compare October 14, 2023 13:03

ghost pushed a commit to leanprover-community/mathlib4 that referenced this pull request Oct 14, 2023

Trigger CI for leanprover/lean4#2577

6fa7da6

Merge commit '3e79ddda27c299a0e66fc996d52fa15fcf4421d8' (effectively …

cf120ac

…v4.2.0-rc2)

ghost pushed a commit to leanprover-community/mathlib4 that referenced this pull request Oct 16, 2023

Trigger CI for leanprover/lean4#2577

ecfd133

ghost added breaks-mathlib This is not necessarily a blocker for merging: but there needs to be a plan and removed builds-mathlib CI has verified that Mathlib builds against this PR labels Oct 16, 2023

nomeata mentioned this pull request Oct 16, 2023

PR releases release (unwanted) merge state #2701

Closed

Merge branch 'master' of github.com:leanprover/lean4 into joachim/com…

50f13c7

…press-discrtree

ghost pushed a commit to leanprover-community/mathlib4 that referenced this pull request Oct 17, 2023

Trigger CI for leanprover/lean4#2577

d861bc7

ghost added builds-mathlib CI has verified that Mathlib builds against this PR and removed breaks-mathlib This is not necessarily a blocker for merging: but there needs to be a plan labels Oct 18, 2023

kim-em added awaiting-author Waiting for PR author to address issues and removed awaiting-review Waiting for someone to review the PR labels Nov 2, 2023

Kha requested review from Kha and leodemoura as code owners November 20, 2023 08:15

nomeata closed this Dec 2, 2023

nomeata reopened this Mar 7, 2026

nomeata closed this Mar 7, 2026

Conversation

nomeata commented Sep 24, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ghost commented Sep 24, 2023 • edited by ghost Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nomeata commented Sep 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Kha commented Sep 25, 2023

Uh oh!

leanprover-bot commented Sep 25, 2023

Uh oh!

nomeata commented Sep 25, 2023

Uh oh!

nomeata commented Sep 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nomeata commented Sep 25, 2023

Uh oh!

nomeata commented Sep 25, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nomeata commented Oct 18, 2023

Uh oh!

kim-em commented Oct 19, 2023

Uh oh!

kim-em commented Nov 2, 2023

Uh oh!

nomeata commented Nov 2, 2023

Uh oh!

nomeata commented Dec 2, 2023

Uh oh!

JovanGerb commented Mar 7, 2026

Uh oh!

nomeata commented Mar 7, 2026

Uh oh!

JovanGerb commented Mar 7, 2026

Uh oh!

nomeata commented Mar 7, 2026

Uh oh!

JovanGerb commented Mar 7, 2026

Uh oh!

nomeata commented Mar 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

nomeata commented Sep 24, 2023 •

edited

Loading

ghost commented Sep 24, 2023 •

edited by ghost

Loading

nomeata commented Sep 25, 2023 •

edited

Loading

nomeata commented Sep 25, 2023 •

edited

Loading

nomeata commented Sep 25, 2023 •

edited

Loading