Skip to content

feat(Computability): add Kleene's algorithm to prove regular languages have regex matching #35600

Open
yisiox wants to merge 9 commits intoleanprover-community:masterfrom
focs-lab:kleenes-algorithm
Open

feat(Computability): add Kleene's algorithm to prove regular languages have regex matching #35600
yisiox wants to merge 9 commits intoleanprover-community:masterfrom
focs-lab:kleenes-algorithm

Conversation

@yisiox
Copy link
Copy Markdown

@yisiox yisiox commented Feb 21, 2026

This PR adds a proof that every regular language has some regular expression matching it.

This was achieved by the following constructions and proofs of their correctness

  • Define toSingleεNFA to transform any epsilon-NFA to an equivalent one with only a single start and accept state (and the type is ExtendedState)
  • Define a bijection between ExtendedState and Fin (FinEnum.card (ExtendedState σ)) to index the states
  • Define pathRegex which implements Kleene's algorithm
  • Define toRegex which takes an epsilon-NFA, applies toSingleεNFA on it, and computes the corresponding regex using pathRegex

The proofs of correctness chain together to yield (toRegex M).matches' = M.accepts which asserts the direction of Kleene's theorem required.


As in #mathlib4 > Regular languages: the review queue @ 💬, this overlaps with #15654, which uses GNFA instead but has been inactive.

Open in Gitpod

@github-actions github-actions bot added the new-contributor This PR was made by a contributor with at most 5 merged PRs. Welcome to the community! label Feb 21, 2026
@github-actions
Copy link
Copy Markdown

Welcome new contributor!

Thank you for contributing to Mathlib! If you haven't done so already, please review our contribution guidelines, as well as the style guide and naming conventions.

We use a review queue to manage reviews. If your PR does not appear there, it is probably because it is not successfully building (i.e., it doesn't have a green checkmark), has the awaiting-author tag, or another reason described in the Lifecycle of a PR.

If you haven't already done so, please come to https://leanprover.zulipchat.com/, introduce yourself, and mention your new PR.

Thank you again for joining our community.

@github-actions github-actions bot added the t-computability Computability theory (TMs, DFAs, languages, grammars, etc) label Feb 21, 2026
@github-actions
Copy link
Copy Markdown

github-actions bot commented Feb 21, 2026

PR summary 592989bdba

Import changes for modified files

Dependency changes

File Base Count Head Count Change
Mathlib.Computability.EpsilonNFA 719 729 +10 (+1.39%)
Import changes for all files
Files Import difference
Mathlib.Computability.EpsilonNFA 10

Declarations diff

+ ExtendedState
+ ExtendedState.equivSum
+ IsPath.from_accept
+ IsPath.state_accept
+ IsPath.toSingleεNFA_lift_extendedState
+ IsRestrictedMatch
+ IsRestrictedMatch.mono
+ IsRestrictedPath
+ accepts_toRegex
+ accepts_toSingleεNFA
+ directRegex
+ e
+ instance [FinEnum σ] : FinEnum (ExtendedState σ)
+ instance {M : εNFA α σ}
+ isRestrictedMatch_iff_exists_isPath
+ isRestrictedMatch_iff_exists_isRestrictedPath
+ isRestrictedMatch_nil
+ isRestrictedMatch_of_mem_pathRegex
+ isRestrictedMatch_star
+ isRestrictedPath_iff_isPath
+ matches'_sum_map
+ mem_matches'_directRegex
+ mem_matches'_mul_star_mul
+ mem_matches'_star
+ mem_matches'_star_concat
+ mem_pathRegex_iff_isRestrictedMatch
+ mem_pathRegex_of_isRestrictedMatch
+ pathRegex
+ pathRegex_mono
+ pathRegex_trans
+ toRegex
+ toSingleεNFA

You can run this locally as follows
## summary with just the declaration names:
./scripts/pr_summary/declarations_diff.sh <optional_commit>

## more verbose report:
./scripts/pr_summary/declarations_diff.sh long <optional_commit>

The doc-module for scripts/pr_summary/declarations_diff.sh contains some details about this script.


No changes to technical debt.

You can run this locally as

./scripts/reporting/technical-debt-metrics.sh pr_summary
  • The relative value is the weighted sum of the differences with weight given by the inverse of the current value of the statistic.
  • The absolute value is the relative value divided by the total sum of the inverses of the current values (i.e. the weighted average of the differences).

@yisiox yisiox marked this pull request as ready for review March 5, 2026 06:24
@yisiox yisiox changed the title Add implementation of Kleene's algorithm to show each regular language has a regex matching it feat(Computability): add Kleene's algorithm to prove regular languages have regex matching Mar 5, 2026
Copy link
Copy Markdown
Contributor

@YaelDillies YaelDillies left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure you don't want to contribute this to cslib instead? I am by no means a computability expert, and you will probably experience much faster review there than here.


theorem matches'_foldl_acc {α : Type*}
(L : List α) (f : α → RegularExpression α) (acc : RegularExpression α) :
(L.foldl (fun acc a => acc + f a) acc).matches' =
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this not the same as

Suggested change
(L.foldl (fun acc a => acc + f a) acc).matches' =
(acc + (L.map f).sum).matches' =

? Let's use existing primitives where possible. Eg this shows that probably we want lemmas about (a + b).matches' and (L.map f).sum.matches'

Copy link
Copy Markdown
Author

@yisiox yisiox Mar 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was struggling with representing the base case of Kleene's algorithm $R^0_{ij}=\bigcup_{a:q_j\in \delta(q_i,a)}a$ and had the awkward foldl in directRegex, hence this lemma about foldl and matches'. I changed it now to sum and map as suggested and only a lemma about (L.map f).sum.matches' is needed since there is no accumulator anymore. Hopefully it's less awkward.

@YaelDillies YaelDillies added the awaiting-author A reviewer has asked the author a question or requested changes. label Mar 24, 2026
@yisiox
Copy link
Copy Markdown
Author

yisiox commented Mar 24, 2026

Are you sure you don't want to contribute this to cslib instead? I am by no means a computability expert, and you will probably experience much faster review there than here.

This came about because I needed algebraic stuff for Mazurkiewicz trace theory and thus relied on mathlib. I thought it would be appropriate to add this to mathlib as at some point I realised I needed Kleene's theorem.

@yisiox
Copy link
Copy Markdown
Author

yisiox commented Mar 24, 2026

-awaiting-author

@github-actions github-actions bot removed the awaiting-author A reviewer has asked the author a question or requested changes. label Mar 24, 2026
@YaelDillies
Copy link
Copy Markdown
Contributor

YaelDillies commented Mar 24, 2026

Interesting, thank you for letting me know. However, are you aware that cslib depends on mathlib? It therefore looks to me like cslib is a perfect fit for your contribution

@yisiox
Copy link
Copy Markdown
Author

yisiox commented Mar 26, 2026

I see, thanks for the suggestion! I looked at cslib and noticed that cslib has their own definition of automata, only imports mathlib computability to define IsRegular, and has no definition for regular expressions. Perhaps some translations are needed to get this suitable for cslib, and will require importing regular expressions from mathlib too. I definitely want to support the cslib ecosystem though and will see if I can reach out to someone from the cslib community for their thoughts on the matter.

But still mathlib computability has definitions of automata, regular languages, and regular expressions. Kleene's theorem seems like a natural completion of the API.

@YaelDillies
Copy link
Copy Markdown
Contributor

still mathlib computability has definitions of automata, regular languages, and regular expressions. Kleene's theorem seems like a natural completion of the API.

These were added to mathlib before cslib existed. It is IMO reasonable to move these to cslib now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

new-contributor This PR was made by a contributor with at most 5 merged PRs. Welcome to the community! t-computability Computability theory (TMs, DFAs, languages, grammars, etc)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants