P2728: loosen wording to admit chunked (SIMD) implementations by ednolan · Pull Request #237 · bemanproject/utf_view

ednolan · 2026-06-09T23:41:35Z

The wording previously pinned the transcoding iterator to decoding one code point per read: buf_ was sized to hold exactly one transcoded code point, and base() and iterator equality were specified directly in terms of the exposition-only members, making the buffer anchor observable and blocking an as-if chunked implementation. Loosen it so an implementation may transcode chunks of input at a time, e.g. with SIMD:

Make buf_'s capacity an unspecified constant, buffer-capacity, of at least 4 / sizeof(ToType), and widen buf_index_ and to_increment_ accordingly. Chunking is permitted only when the underlying range models forward_range; read-ahead on a single-pass range is destructive and therefore observable.
Allow read() to transcode an implementation-chosen number n >= 1 of consecutive input subsequences per invocation, provided the result fits in buf_. The choice of n is unobservable and may vary call to call: an implementation typically transcodes a fixed-size window of code units trimmed back to whole input subsequences. Substitution of Maximal Subparts applies per input subsequence exactly as before.
Allow read-reverse() to chunk symmetrically (n subsequences ending at current_), specify explicitly that it leaves current_ at the beginning of the first subsequence it transcoded, and note that the chunk's starting boundary is locatable by bounded backward scanning.
Respecify base() and iterator equality positionally -- in terms of the position of the current input subsequence and the offset of the current element within its transcoded code units -- instead of by memberwise comparison, since iterators denoting the same element can hold different buffer anchors once buffers are chunked.
Make operator-- skip an ill-formed input subsequence's three-unit replacement-character encoding as a unit in _or_error views with char8_t output, mirroring operator++: with chunked buffers those code units can sit mid-buffer, and the skip preserves the canonical first-unit position that positional equality relies on.
Add a "SIMD support" design discussion section: iterator size and ABI implications of the buffer capacity, base() implementation strategy, validation-plus-scalar-fallback for SMS error handling, interactivity considerations, preliminary UTF-16 to UTF-8 performance numbers from the prototype std::simd kernel (enolan_simd4 branch), and why a view cannot approach bulk transcoding speed. Update the changelog.

coveralls · 2026-06-09T23:45:28Z

coverage: 99.744%. remained the same — enolan_simdwording2 into main

The wording previously pinned the transcoding iterator to decoding one code point per read: buf_ was sized to hold exactly one transcoded code point, and base() and iterator equality were specified directly in terms of the exposition-only members, making the buffer anchor observable and blocking an as-if chunked implementation. Loosen it so an implementation may transcode chunks of input at a time, e.g. with SIMD: - Make buf_'s capacity an unspecified constant, buffer-capacity, of at least 4 / sizeof(ToType), and widen buf_index_ and to_increment_ accordingly. Chunking is permitted only when the underlying range models forward_range; read-ahead on a single-pass range is destructive and therefore observable. - Allow read() to transcode an implementation-chosen number n >= 1 of consecutive input subsequences per invocation, provided the result fits in buf_. The choice of n is unobservable and may vary call to call: an implementation typically transcodes a fixed-size window of code units trimmed back to whole input subsequences. Substitution of Maximal Subparts applies per input subsequence exactly as before. - Allow read-reverse() to chunk symmetrically (n subsequences ending at current_), specify explicitly that it leaves current_ at the beginning of the first subsequence it transcoded, and note that the chunk's starting boundary is locatable by bounded backward scanning. - Respecify base() and iterator equality positionally -- in terms of the position of the current input subsequence and the offset of the current element within its transcoded code units -- instead of by memberwise comparison, since iterators denoting the same element can hold different buffer anchors once buffers are chunked. - Make operator-- skip an ill-formed input subsequence's three-unit replacement-character encoding as a unit in _or_error views with char8_t output, mirroring operator++: with chunked buffers those code units can sit mid-buffer, and the skip preserves the canonical first-unit position that positional equality relies on. - Add a "SIMD support" design discussion section: iterator size and ABI implications of the buffer capacity, base() implementation strategy, validation-plus-scalar-fallback for SMS error handling, interactivity considerations, preliminary UTF-16 to UTF-8 performance numbers from the prototype std::simd kernel (enolan_simd4 branch), and why a view cannot approach bulk transcoding speed. Update the changelog. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

ednolan force-pushed the enolan_simdwording2 branch from ac3cf3d to 7d9aefd Compare June 9, 2026 23:47

ednolan merged commit 9a87f3b into main Jun 9, 2026
53 checks passed

ednolan deleted the enolan_simdwording2 branch June 9, 2026 23:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

P2728: loosen wording to admit chunked (SIMD) implementations#237

P2728: loosen wording to admit chunked (SIMD) implementations#237
ednolan merged 1 commit into
mainfrom
enolan_simdwording2

ednolan commented Jun 9, 2026

Uh oh!

coveralls commented Jun 9, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

ednolan commented Jun 9, 2026

Uh oh!

coveralls commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coveralls commented Jun 9, 2026 •

edited

Loading