-
Notifications
You must be signed in to change notification settings - Fork 603
regex: eval/cut: fix premature local undo and $^R #24004
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
iabyn
wants to merge
3
commits into
blead
Choose a base branch
from
davem/re_savestack_eval
base: blead
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+125
−43
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
GH #16197 The main purpose of this commit is to stop premature scope unwinding within eval code in regexes. Aside from backtracking on failure, the scopes of every eval, such as (?{ local $x }), are supposed to accumulate, and are only unwound en-masse at the very end of the pattern match. However, a combination of a sub-pattern call, such as (??{...}) or (?&...), combined with a cut, (?>...), can trigger earlier savestack popping. The direct fix for this, as explained below, is to remove this single line from the EVAL_postponed_A/B: case: REGCP_UNWIND(ST.cp); /* LEAVE in disguise */ However, that line is entwined with code which attempts to preserve the final value of $^R during scope unwinding. Since that code was kind of working around the misplaced REGCP_UNWIND(), it needs ripping out and re-implementing. This has to be done at the same time, so the bulk of this commit is actually concerned with $^R, even though it isn't the subject of this ticket. So this commit doesn't change the behaviour of $^R, but just changes its implementation somewhat. The $^R issue is that every /(?{...})/ causes the equivalent of local $^R = ...; to be executed. During final exit, the savestack gets unwound, and all those local's get undone, leaving $^R with the value it had before the match started. But we promise that after the match, $^R will hold the value of the most recent (?{...}). The code which this commit rips out restored that value in one way; the new code in this commit does it a different way. Basically, almost the last thing S_regmatch() does is a LEAVE_SCOPE(orig_savestack_ix); This commit makes it so that the current value of $^R is copied just before the LEAVE_SCOPE(), and that value is copied back to $^R just after the LEAVE_SCOPE(). For efficiency, we only do the copy if we've actually set $^R. A mechanism is also needed to ensure that the temporary copy doesn't leak if we die during the savestack unwind. This is achieved by holding a pointer to the copy in the aux_eval struct, which gets processed even if we die. Now back to the main purpose of this commit, the premature stack unwind in the presence of a cut with a sub-pattern. This bug has been there since these features were added. It is instructive to look at a somewhat idealised overview of the S_regmatch() function from around 5.6.0 (with some bug fixes from later releases added). This was while the function was still recursive. It looks approximately like: pp_match(...) { I32 ix = PL_savestack_ix; ... do a match ... LEAVE_SCOPE(ix); } S_regmatch(...) { while (scan) { switch (OP(scan)) { case FOO: if (! there's a FOO) return 0; I32 ix = PL_savestack_ix; if (regmatch(...)) /* recursively match rest of pattern */ return 1; LEAVE_SCOPE(ix); return 0; case END: if (doing a (??{...}) ) { I32 ix = PL_savestack_ix; if (regmatch(...) { /* recursively match rest of pattern */ LEAVE_SCOPE(ix); return 1; } LEAVE_SCOPE(ix); return 0; } return 1; case EVAL: ... run the code, then, if its a (?{...}) ... I32 ix = PL_savestack_ix; if (regmatch(...) { /* recursively run subpattern */ LEAVE_SCOPE(ix) return 1; } LEAVE_SCOPE(ix); return 0; } } Here, the FOO: case represents all the various ops which recurse. In general, they match the next item and then recurse to match the rest of the pattern. Note that they all do a LEAVE_SCOPE() only in the *failure* branch. At the end of a successful match, there is potentially much recursion, and much stuff on the savestack. When the END op is reached, the series of 'return 1's causes all the recursion to unwind, while leaving the savestack untouched. Finally, the caller - such as pp_match() - clears the savestack. In more recent perls the recursion has been removed and the final LEAVE_SCOPE() is done within S_regmatch() itself, but the principle remains the same: no stack freeing is done *during* matching, and instead there's a single big clean up at the end. Once (??{...}) enters the picture, that changes a bit. When the END op associated with the '...' sub-pattern is reached, regmatch() is called recursively to process any pattern after the (??{..}); then on success, while working its way back through the nested regmatch() calls, both the END and the EVAL code do a LEAVE_SCOPE() in the *success* branch. This is anomalous, and those two LEAVE_SCOPE()'s are what this commit removes (although in the current non-recursive regex engine, they are shared by the same piece of code, so only one had to be removed). By removing them, this regularises the behaviour of sub-patterns. I can't think why those LEAVE_SCOPE()s were originally added, and assume it was a thinko. Normally it makes no difference whether the savestack is popped near the end, interleaved with popping all recursive regmatch calls (or equivalently on non-recursive engines popping the regmatch_state stack), or whether the savestack is popped only after all the recursion is exited from. However, it makes a difference in the presence of a cut, (?>...). Here, the final op of the sub-pattern '...' is SUCCEED, which rather than recursing to match anything following the cut block, just returns. The recursion pops back to the SUSPEND op which started the cut, which then continues with the op loop as normal. Thus when about to match the first op following a (?>...), the recursion *within* the cut has been blown away as if it never happened, but the accumulated savestack entries (e.g. from evals within the cut block) are preserved and continue to accumulate. Now, if there is a (??{...}) sub-pattern, or similarly a (?&FOO), within the cut, then at the end of the cut, the recursion is unwound, which includes the stacked EVAL and END recursions, which at this time call LEAVE_SCOPE(), which frees part of the savestack, even though the pattern match hasn't ended yet. That's the bug which this commit fixes. The tests added to pat_re_eval.t check that the scope bug has been fixed. The test added to svleak.t checks that the new $^R copying code doesn't leak.
This commit should produce no practical change in functionality. Currently, S_regtry() notes the position of PL_savestack_ix, calls S_regmatch(), then pops the savestack back to that position. However, S_regmatch() also does this just before returning, so it's redundant in S_regtry(). (A temporary assert confirmed that lastcp == PL_savestack_ix in S_regtry always while running the test suite). So this commit removes the REGCP_UNWIND(lastcp) and associated machinery from S_regtry(). It also regularises the "note current ix; pop back to old ix" code at the start and end of S_regmatch() to use the standard REGCP_SET() and REGCP_UNWIND() macros which do the same thing but also produce debugging messages.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The main commit in this branch fixes GH #16197.
The follow-up commit does some minor code tidying.