fix: normalize_path ambiguity from dotdot cancellation#986
fix: normalize_path ambiguity from dotdot cancellation#986alandefreitas merged 1 commit intoboostorg:developfrom
Conversation
The existing ambiguity handling in normalize_path only checked the
path before running remove_dot_segments. It detected dot prefixes
("/./" or "./") hiding "//" and colons in the first segment, but
missed cases where ".." segments cancel regular segments and produce
ambiguity in the output:
- scheme:/a/..//evil -> scheme://evil (authority ambiguity)
- a/../b:c -> b:c (scheme ambiguity)
- a/..//evil -> /evil (relative path became absolute)
Fix by checking remove_dot_segments output for these conditions and
inserting a path shield or encoding colons as needed. Colon encoding
moved entirely to after dot removal, simplifying the pre-dot-removal
step to only handle shield preservation.
fix boostorg#985
|
An automated preview of the documentation is available at https://986.url.prtest2.cppalliance.org/index.html If more commits are pushed to the pull request, the docs will rebuild at the same URL. 2026-04-03 19:00:44 UTC |
|
GCOVR code coverage report https://986.url.prtest2.cppalliance.org/gcovr/index.html Build time: 2026-04-03 19:13:12 UTC |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## develop #986 +/- ##
========================================
Coverage 99.11% 99.11%
========================================
Files 155 155
Lines 10085 10094 +9
========================================
+ Hits 9996 10005 +9
Misses 89 89
Continue to review full report in Codecov by Sentry.
🚀 New features to boost your workflow:
|
This started with #983, where a user asked whether Boost.URL could resolve relative IRIs without full normalization. While mapping out which parts of
normalize_pathare essential and which are optional, I realized the function's ambiguity handling had a gap that went deeper than anyone expected.The existing code correctly handled the case where a dot prefix (
/./or./) directly hides//in the path. For example,/.//evilis already protected: the/./prefix prevents//evilfrom being parsed as an authority. Butnormalize_pathonly checked for these prefixes before runningremove_dot_segments. It never considered what happens when..segments cancel regular segments and create a//that wasn't visible at the start:normalize()scheme:/a/..//evilscheme://evila/../b:cb:ca/..//evil/evilEach of these is a valid URI-reference that, after normalization, re-parses to something with a completely different meaning.
The fix checks the output of
remove_dot_segmentsrather than always trying to predict it. Colon encoding moved from before dot removal to after, so it operates on the actual result instead of guessing which colons will end up in the first segment. This also simplified the pre-dot-removal step to a single concern: preserving existing path shields.This work also settled #931, which asked whether normalization functions could be
noexcept. We'd been going back and forth on whether normalization can ever make the string longer. It turns out it can:a/../::::(10 bytes) normalizes to%3A%3A%3A%3A(12 bytes). The..cancellation shrinks the path, but the colon encoding that follows grows it by more. The growth is unbounded (N colons produce 3N bytes of output), so normalization genuinely needs to allocate and cannot benoexcept. A test case for this is included.Another smaller fix is that
remove_dot_segmentscan also turn a relative path into an absolute one when..goes above root (a/..//evilbecomes/evil). The new code detects this and inserts./to preserve the relative path type.fix #985