Skip to content

fix: normalize_path ambiguity from dotdot cancellation#986

Merged
alandefreitas merged 1 commit intoboostorg:developfrom
alandefreitas:develop
Apr 3, 2026
Merged

fix: normalize_path ambiguity from dotdot cancellation#986
alandefreitas merged 1 commit intoboostorg:developfrom
alandefreitas:develop

Conversation

@alandefreitas
Copy link
Copy Markdown
Member

This started with #983, where a user asked whether Boost.URL could resolve relative IRIs without full normalization. While mapping out which parts of normalize_path are essential and which are optional, I realized the function's ambiguity handling had a gap that went deeper than anyone expected.

The existing code correctly handled the case where a dot prefix (/./ or ./) directly hides // in the path. For example, /.//evil is already protected: the /./ prefix prevents //evil from being parsed as an authority. But normalize_path only checked for these prefixes before running remove_dot_segments. It never considered what happens when .. segments cancel regular segments and create a // that wasn't visible at the start:

Input After normalize() Problem
scheme:/a/..//evil scheme://evil authority appears from nowhere
a/../b:c b:c scheme appears from nowhere
a/..//evil /evil relative path becomes absolute

Each of these is a valid URI-reference that, after normalization, re-parses to something with a completely different meaning.

The fix checks the output of remove_dot_segments rather than always trying to predict it. Colon encoding moved from before dot removal to after, so it operates on the actual result instead of guessing which colons will end up in the first segment. This also simplified the pre-dot-removal step to a single concern: preserving existing path shields.

This work also settled #931, which asked whether normalization functions could be noexcept. We'd been going back and forth on whether normalization can ever make the string longer. It turns out it can: a/../:::: (10 bytes) normalizes to %3A%3A%3A%3A (12 bytes). The .. cancellation shrinks the path, but the colon encoding that follows grows it by more. The growth is unbounded (N colons produce 3N bytes of output), so normalization genuinely needs to allocate and cannot be noexcept. A test case for this is included.

Another smaller fix is that remove_dot_segments can also turn a relative path into an absolute one when .. goes above root (a/..//evil becomes /evil). The new code detects this and inserts ./ to preserve the relative path type.

fix #985

The existing ambiguity handling in normalize_path only checked the
path before running remove_dot_segments. It detected dot prefixes
("/./" or "./") hiding "//" and colons in the first segment, but
missed cases where ".." segments cancel regular segments and produce
ambiguity in the output:

- scheme:/a/..//evil -> scheme://evil (authority ambiguity)
- a/../b:c -> b:c (scheme ambiguity)
- a/..//evil -> /evil (relative path became absolute)

Fix by checking remove_dot_segments output for these conditions and
inserting a path shield or encoding colons as needed. Colon encoding
moved entirely to after dot removal, simplifying the pre-dot-removal
step to only handle shield preservation.

fix boostorg#985
@cppalliance-bot
Copy link
Copy Markdown

An automated preview of the documentation is available at https://986.url.prtest2.cppalliance.org/index.html

If more commits are pushed to the pull request, the docs will rebuild at the same URL.

2026-04-03 19:00:44 UTC

@cppalliance-bot
Copy link
Copy Markdown

GCOVR code coverage report https://986.url.prtest2.cppalliance.org/gcovr/index.html
LCOV code coverage report https://986.url.prtest2.cppalliance.org/genhtml/index.html
Coverage Diff Report https://986.url.prtest2.cppalliance.org/diff-report/index.html

Build time: 2026-04-03 19:13:12 UTC

@alandefreitas alandefreitas merged commit 2c6cdc6 into boostorg:develop Apr 3, 2026
47 of 48 checks passed
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 3, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 99.11%. Comparing base (3f8a428) to head (f7e9ad2).
⚠️ Report is 1 commits behind head on develop.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff            @@
##           develop     #986   +/-   ##
========================================
  Coverage    99.11%   99.11%           
========================================
  Files          155      155           
  Lines        10085    10094    +9     
========================================
+ Hits          9996    10005    +9     
  Misses          89       89           
Files with missing lines Coverage Δ
include/boost/url/impl/url_base.hpp 99.47% <100.00%> (+<0.01%) ⬆️

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3f8a428...f7e9ad2. Read the comment docs.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

normalize_path: remove_dot_segments can produce authority ambiguity from ".." segments

2 participants