Skip to content

Conversation

@n3rada
Copy link

@n3rada n3rada commented Dec 11, 2025

Hello maintainers 👋

The current implementation of the headers property incorrectly splits header values on commas. Many standard HTTP headers legitimately contain commas inside their values (e.g., Accept, Accept-Language, User-Agent). Splitting these values destroys their ordering, semantics, and meaning, resulting in invalid or incomplete headers being exposed to Python.

Additionally, because HTTPMessage[key] = value overwrites previous entries for the same header name, the split logic also caused the last fragment to override all earlier fragments.

Before the fix, a request with:

Accept: image/avif,image/webp,image/apng,image/svg+xml,image/*,*/*;q=0.8

was exposed as:

"accept": "image/avif"

Similarly, a User-Agent such as:

Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/142.0.0.0 Safari/537.36

was incorrectly truncated because commas inside parentheses were interpreted as separators.

I don't know if it was the intended way of doing it, but it solved my problem.

@n3rada n3rada requested review from a team as code owners December 11, 2025 15:56
@github-actions
Copy link

github-actions bot commented Dec 11, 2025

All contributors have signed the CLA ✍️ ✅
Posted by the CLA Assistant Lite bot.

@jasnell jasnell added the python Issues/PRs relating to Python Workers label Dec 11, 2025
@jasnell jasnell requested review from dom96 and hoodmane December 11, 2025 16:04
@n3rada
Copy link
Author

n3rada commented Dec 11, 2025

I have read the CLA Document and I hereby sign the CLA

github-actions bot added a commit that referenced this pull request Dec 11, 2025
@jasnell
Copy link
Collaborator

jasnell commented Dec 11, 2025

The splitting here definitely looks wrong. At the very least it needs to be taking quoted strings into consideration. That said, any change on this is likely to need a compatibility flag as changing this is likely breaking for existing workers.

@n3rada
Copy link
Author

n3rada commented Dec 11, 2025

On my side, this change has corrected my workers, not broken them! No more incorrectly sent UAs!

@jasnell
Copy link
Collaborator

jasnell commented Dec 11, 2025

Understood. However, it's conceivable that changing this could break other workers, which means fixing it would require a compatibility flag so that those existing workers would be unaffected by the change.

@ryanking13
Copy link
Contributor

Yes, the splitting definitely looks wrong. @hoodmane @dom96 Do we have a way to load compat flags in that Python file? It looks like the MetadataReader is only in JavaScript.

@jasnell
Copy link
Collaborator

jasnell commented Dec 12, 2025

Compat flags can be inspected using the globalThis.Cloudflare.compatibilityFlags

@hoodmane
Copy link
Contributor

Yeah maybe we should do registerJsModule("_cloudflare_compat_flags", COMPATIBILITY_FLAGS).

@jasnell
Copy link
Collaborator

jasnell commented Dec 12, 2025

If I remember correctly, I believe @anonrig has some code in the node compat stuff for parsing headers out with quoted strings handled as part of the node:http impl. Worth taking a look there for impl logic that can be adopted/duplicated.

@dom96
Copy link
Contributor

dom96 commented Jan 2, 2026

Looking at the node.js implementation I see splitHeaderValue in internal_http_utils.ts. It seems to be intended for Headers which are only ever expected to contain a single value, as it only ever returns up to the first comma. So we cannot use it for all header keys. Python's Headers are also stored in a email.message.Message which supports retrieval of all the values via get_all, so we need to split the header values up appropriately.

@n3rada
Copy link
Author

n3rada commented Jan 5, 2026

I think there’s a conceptual mix-up here between:

  1. multiple header fields with the same name (e.g. two Set-Cookie: lines), and
  2. a single header field whose field-value contains commas (e.g. Accept, Accept-Language, User-Agent, Cache-Control, etc.).

Many standard headers legitimately contain commas, and commas do not imply "multiple header instances" unless the specific header definition says so (and even then quoted strings must be respected). email.message.Message.get_all() is useful to preserve multiple header fields with the same name. It should not require (or justify) comma-tokenizing a single field-value.

So I don’t think the fix should be "split appropriately". The fix should be "do not split at all at this layer".

What do you think @dom96?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

python Issues/PRs relating to Python Workers

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants