Skip to content

perf: path compression for DiscrTree#12838

Draft
nomeata wants to merge 1 commit intomasterfrom
joachim/compress-discrtree
Draft

perf: path compression for DiscrTree#12838
nomeata wants to merge 1 commit intomasterfrom
joachim/compress-discrtree

Conversation

@nomeata
Copy link
Copy Markdown
Collaborator

@nomeata nomeata commented Mar 7, 2026

This PR adds path compression to the DiscrTree trie data structure.
Instead of a single .node constructor, the Trie type now has four
constructors: .empty, .values, .path (for compressed sequences of
keys with no branching), and .branch. This significantly reduces the
size and maximum depth of large discrimination trees.

@nomeata nomeata added the changelog-language Language features and metaprograms label Mar 7, 2026
@nomeata nomeata changed the title refactor: path compression for DiscrTree perf: path compression for DiscrTree Mar 7, 2026
This PR adds path compression to the DiscrTree trie data structure.
Instead of a single `.node` constructor, the `Trie` type now has four
constructors: `.empty`, `.values`, `.path` (for compressed sequences of
keys with no branching), and `.branch`. This significantly reduces the
size and maximum depth of large discrimination trees.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@nomeata nomeata force-pushed the joachim/compress-discrtree branch from 2e0c3de to f82aa9a Compare March 7, 2026 16:12
@nomeata
Copy link
Copy Markdown
Collaborator Author

nomeata commented Mar 7, 2026

!radar

@leanprover-radar
Copy link
Copy Markdown

leanprover-radar commented Mar 7, 2026

Benchmark results for f82aa9a against 333ab1c are in! @nomeata

  • build//instructions: -30.5G (-0.24%)

Large changes (1🟥)

  • 🟥 compiled/liasolver//instructions: +30.2M (+0.73%)

Medium changes (9✅)

  • build/module/Std.Tactic.BVDecide.Bitblast.BVExpr.Circuit.Lemmas.Expr//instructions: -2.0G (-1.73%) (reduced significance based on absolute threshold)
  • elab/big_deceq_rec//instructions: -57.3M (-0.89%)
  • elab/big_struct//instructions: -26.9M (-0.92%)
  • elab/iterators//instructions: -38.4M (-1.31%)
  • elab/simp_arith1//instructions: -30.5M (-1.29%)
  • interpreted/iterators//instructions: -40.4M (-0.11%)
  • lake/inundation/build clean//instructions: -44.7G (-2.85%)
  • lake/inundation/config elab//instructions: -34.1M (-1.32%)
  • misc/import Lean//instructions: -40.3M (-2.71%)

Small changes (407✅, 7🟥)

Too many entries to display here. View the full report on radar instead.

@nomeata
Copy link
Copy Markdown
Collaborator Author

nomeata commented Mar 7, 2026

!bench mathlib

@leanprover-radar
Copy link
Copy Markdown

leanprover-radar commented Mar 7, 2026

Benchmark results for leanprover-community/mathlib4-nightly-testing@a873fb4 against leanprover-community/mathlib4-nightly-testing@ec89131 are in! @nomeata

  • 🟥 main exited with code 1

No significant changes detected.

@github-actions github-actions bot added the toolchain-available A toolchain is available for this PR, at leanprover/lean4-pr-releases:pr-release-NNNN label Mar 7, 2026
@leanprover-bot
Copy link
Copy Markdown
Collaborator

Reference manual CI status:

  • ❗ Reference manual CI can not be attempted yet, as the nightly-testing-2026-03-05 tag does not exist there yet. We will retry when you push more commits. If you rebase your branch onto nightly-with-manual, reference manual CI should run now. You can force reference manual CI using the force-manual-ci label. (2026-03-07 17:44:00)

mathlib-nightly-testing bot pushed a commit to leanprover-community/batteries that referenced this pull request Mar 7, 2026
@github-actions github-actions bot added the mathlib4-nightly-available A branch for this PR exists at leanprover-community/mathlib4-nightly-testing:lean-pr-testing-NNNN label Mar 7, 2026
mathlib-nightly-testing bot pushed a commit to leanprover-community/mathlib4-nightly-testing that referenced this pull request Mar 7, 2026
@mathlib-lean-pr-testing mathlib-lean-pr-testing bot added the breaks-mathlib This is not necessarily a blocker for merging: but there needs to be a plan label Mar 7, 2026
@mathlib-lean-pr-testing
Copy link
Copy Markdown

Mathlib CI status (docs):

@nomeata
Copy link
Copy Markdown
Collaborator Author

nomeata commented Mar 7, 2026

This analysis was generated by Claude.

DiscrTree value array length analysis (Mathlib, all eagerly-loaded extensions)

Ran against Mathlib with leanprover/lean4:v4.29.0-rc4. Collected histogram of value array lengths across all DiscrTree nodes that store values.

Per-extension breakdown

Extension Value nodes Total values Max depth
simp post 88,948 95,230 2,928
instances 37,604 37,824 1,092
ext 1,264 1,270 7
seval post 302 306 47
simproc post 298 370 40
symm 163 163 7
refl 148 148 7
unification hints 18 43 10
simp pre 0 0 4
simproc pre 0 0 0
seval pre 0 0 0
Combined 128,745 135,354 2,928

Note: Grind does not use DiscrTree.

Histogram of value array lengths (combined)

Length Count %
1 124,467 96.7%
2 3,370 2.6%
3 543 0.4%
4 155 0.1%
5 68
6 34
7 24
8 16
9 10
10 9
11–20 30
21–50 13
51–100 4
101+ 2 (max: 191)

Singleton optimization potential

96.7% of value nodes contain exactly one value. A dedicated singleton constructor (storing a single value instead of an Array) would save ~32 bytes per node (8 header + 8 size + 8 capacity + 8 element for the array object), totaling approximately 3.8 MiB across all Mathlib DiscrTrees.

@nomeata
Copy link
Copy Markdown
Collaborator Author

nomeata commented Mar 7, 2026

!bench mathlib

@leanprover-radar
Copy link
Copy Markdown

leanprover-radar commented Mar 7, 2026

Benchmark results for leanprover-community/mathlib4-nightly-testing@d3612e5 against leanprover-community/mathlib4-nightly-testing@ec89131 are in! @nomeata

  • build//instructions: -1.1T (-0.64%)

No significant changes detected.

@mathlib-lean-pr-testing
Copy link
Copy Markdown

Mathlib CI status (docs):

@nomeata
Copy link
Copy Markdown
Collaborator Author

nomeata commented Mar 7, 2026

Screenshot_20260307-213220
Mathlib start up times are not improving significantly to justify this extra complexity id say.

@Garmelon
Copy link
Copy Markdown
Contributor

Garmelon commented Mar 9, 2026

Sorry for temporarily hijacking this PR for a test, but radar should now use mathlib significances for nightly-testing comparisons:
!bench mathlib

@leanprover-radar
Copy link
Copy Markdown

Benchmark results for leanprover-community/mathlib4-nightly-testing@d3612e5 against leanprover-community/mathlib4-nightly-testing@ec89131 are in! @Garmelon

  • build//instructions: -1.1T (-0.64%)

Medium changes (1✅)

1 hidden

Small changes (244✅, 2🟥)

Too many entries to display here. View the full report on radar instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaks-mathlib This is not necessarily a blocker for merging: but there needs to be a plan changelog-language Language features and metaprograms mathlib4-nightly-available A branch for this PR exists at leanprover-community/mathlib4-nightly-testing:lean-pr-testing-NNNN toolchain-available A toolchain is available for this PR, at leanprover/lean4-pr-releases:pr-release-NNNN

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants