feat(Computability): add Kleene's algorithm to prove regular languages have regex matching #35600
feat(Computability): add Kleene's algorithm to prove regular languages have regex matching #35600yisiox wants to merge 9 commits intoleanprover-community:masterfrom
Conversation
Welcome new contributor!Thank you for contributing to Mathlib! If you haven't done so already, please review our contribution guidelines, as well as the style guide and naming conventions. We use a review queue to manage reviews. If your PR does not appear there, it is probably because it is not successfully building (i.e., it doesn't have a green checkmark), has the If you haven't already done so, please come to https://leanprover.zulipchat.com/, introduce yourself, and mention your new PR. Thank you again for joining our community. |
PR summary 592989bdba
|
| File | Base Count | Head Count | Change |
|---|---|---|---|
| Mathlib.Computability.EpsilonNFA | 719 | 729 | +10 (+1.39%) |
Import changes for all files
| Files | Import difference |
|---|---|
Mathlib.Computability.EpsilonNFA |
10 |
Declarations diff
+ ExtendedState
+ ExtendedState.equivSum
+ IsPath.from_accept
+ IsPath.state_accept
+ IsPath.toSingleεNFA_lift_extendedState
+ IsRestrictedMatch
+ IsRestrictedMatch.mono
+ IsRestrictedPath
+ accepts_toRegex
+ accepts_toSingleεNFA
+ directRegex
+ e
+ instance [FinEnum σ] : FinEnum (ExtendedState σ)
+ instance {M : εNFA α σ}
+ isRestrictedMatch_iff_exists_isPath
+ isRestrictedMatch_iff_exists_isRestrictedPath
+ isRestrictedMatch_nil
+ isRestrictedMatch_of_mem_pathRegex
+ isRestrictedMatch_star
+ isRestrictedPath_iff_isPath
+ matches'_sum_map
+ mem_matches'_directRegex
+ mem_matches'_mul_star_mul
+ mem_matches'_star
+ mem_matches'_star_concat
+ mem_pathRegex_iff_isRestrictedMatch
+ mem_pathRegex_of_isRestrictedMatch
+ pathRegex
+ pathRegex_mono
+ pathRegex_trans
+ toRegex
+ toSingleεNFA
You can run this locally as follows
## summary with just the declaration names:
./scripts/pr_summary/declarations_diff.sh <optional_commit>
## more verbose report:
./scripts/pr_summary/declarations_diff.sh long <optional_commit>The doc-module for scripts/pr_summary/declarations_diff.sh contains some details about this script.
No changes to technical debt.
You can run this locally as
./scripts/reporting/technical-debt-metrics.sh pr_summary
- The
relativevalue is the weighted sum of the differences with weight given by the inverse of the current value of the statistic. - The
absolutevalue is therelativevalue divided by the total sum of the inverses of the current values (i.e. the weighted average of the differences).
YaelDillies
left a comment
There was a problem hiding this comment.
Are you sure you don't want to contribute this to cslib instead? I am by no means a computability expert, and you will probably experience much faster review there than here.
|
|
||
| theorem matches'_foldl_acc {α : Type*} | ||
| (L : List α) (f : α → RegularExpression α) (acc : RegularExpression α) : | ||
| (L.foldl (fun acc a => acc + f a) acc).matches' = |
There was a problem hiding this comment.
Is this not the same as
| (L.foldl (fun acc a => acc + f a) acc).matches' = | |
| (acc + (L.map f).sum).matches' = |
? Let's use existing primitives where possible. Eg this shows that probably we want lemmas about (a + b).matches' and (L.map f).sum.matches'
There was a problem hiding this comment.
I was struggling with representing the base case of Kleene's algorithm foldl in directRegex, hence this lemma about foldl and matches'. I changed it now to sum and map as suggested and only a lemma about (L.map f).sum.matches' is needed since there is no accumulator anymore. Hopefully it's less awkward.
This came about because I needed algebraic stuff for Mazurkiewicz trace theory and thus relied on mathlib. I thought it would be appropriate to add this to mathlib as at some point I realised I needed Kleene's theorem. |
|
-awaiting-author |
|
Interesting, thank you for letting me know. However, are you aware that cslib depends on mathlib? It therefore looks to me like cslib is a perfect fit for your contribution |
|
I see, thanks for the suggestion! I looked at cslib and noticed that cslib has their own definition of automata, only imports mathlib computability to define IsRegular, and has no definition for regular expressions. Perhaps some translations are needed to get this suitable for cslib, and will require importing regular expressions from mathlib too. I definitely want to support the cslib ecosystem though and will see if I can reach out to someone from the cslib community for their thoughts on the matter. But still mathlib computability has definitions of automata, regular languages, and regular expressions. Kleene's theorem seems like a natural completion of the API. |
These were added to mathlib before cslib existed. It is IMO reasonable to move these to cslib now |
This PR adds a proof that every regular language has some regular expression matching it.
This was achieved by the following constructions and proofs of their correctness
toSingleεNFAto transform any epsilon-NFA to an equivalent one with only a single start and accept state (and the type isExtendedState)ExtendedStateandFin (FinEnum.card (ExtendedState σ))to index the statespathRegexwhich implements Kleene's algorithmtoRegexwhich takes an epsilon-NFA, appliestoSingleεNFAon it, and computes the corresponding regex usingpathRegexThe proofs of correctness chain together to yield
(toRegex M).matches' = M.acceptswhich asserts the direction of Kleene's theorem required.As in #mathlib4 > Regular languages: the review queue @ 💬, this overlaps with #15654, which uses GNFA instead but has been inactive.