[Repo Assist] Precompile markdown parser regexes to avoid per-call construction#1070
Draft
github-actions[bot] wants to merge 2 commits intomainfrom
Draft
Conversation
The Punctuation and HtmlEntity active patterns in MarkdownInlineParser.fs were constructing regex pattern strings and calling Regex.Match on each invocation. The Punctuation pattern is called for every character during inline parsing, making the repeated regex construction a hot path. Changes: - Extract punctuationRegex and htmlEntityRegex as module-level compiled Regex values (RegexOptions.Compiled) in MarkdownInlineParser.fs - Limit the input string passed to each regex to its theoretical maximum match length (2 chars for punctuation, 34 chars for entities), avoiding conversion of the entire remaining input list on every call - Extract blockquoteRegex as a module-level compiled Regex in MarkdownBlockParser.fs, removing the per-call string concatenation Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
7 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🤖 This PR was created by Repo Assist, an automated AI assistant.
Summary
Three regex patterns in the Markdown parser were constructed or compiled on every call to their containing active patterns, making them unnecessarily expensive hot paths.
Before
MarkdownInlineParser.fs—Punctuationpattern (called for every character during inline parsing):MarkdownInlineParser.fs—HtmlEntitypattern andMarkdownBlockParser.fs—BlockquoteStart: similar per-call string construction andRegex.Matchcalls.After
privatemodule-levelletbindings and compiled withRegexOptions.Compiled, so the NFA is built once at startup.Punctuation: passes only the first 2 chars of remaining input (sufficient for all cases: 1 char for BMP punctuation, 2 chars for surrogate-pair cases). Avoids converting potentially thousands of characters on every call.HtmlEntity: passes only the first 34 chars (the maximum possible entity length:&+ 32 name/digit chars +;).BlockquoteStart: removes per-call string concatenation to build the pattern.Impact
The
Punctuationactive pattern is in the inner loop ofparseChars, which processes every character in the inline content. For documents with long prose sections this is called thousands of times per parse. Precompiling the regex and shortening the input slice should meaningfully reduce allocation and CPU time in the markdown parser hot path.Test Status
✅ Build:
dotnet build FSharp.Formatting.sln --configuration Release— succeeded (0 errors)✅ Tests:
dotnet test tests/FSharp.Markdown.Tests— 257/257 passed✅ Formatting:
dotnet fantomas— no changes needed after manual format