perf: replace Val.Str._asciiSafe flag with AsciiSafeStr subclass#862
Merged
Merged
Conversation
cae7a70 to
8516e7b
Compare
Motivation: The boolean field added 1 byte to every Val.Str instance, which JVM alignment expanded to 8 bytes per object. Val.Str instances number in the millions on string-heavy workloads (e.g. joinedRepeatedString results, format outputs, parsed string literals), so the wasted padding adds up. Modification: Drop `_asciiSafe: Boolean` from Val.Str and introduce a sealed `Val.AsciiSafeStr extends Val.Str` marker subclass. Factory `Val.Str.asciiSafe(pos, s)` now constructs the subclass directly. ByteRenderer and propagation sites switch from `vs._asciiSafe` to `vs.isInstanceOf[Val.AsciiSafeStr]`. Str.concat preserves the subclass when both operands are ASCII-safe (eager and rope paths). Parser/Substr write sites that previously mutated the flag now call the asciiSafe factory directly. Result: 8 bytes saved per Val.Str instance with no behavioral change. JIT still devirtualizes `.str` access via CHA (single non-final implementation in the hierarchy). All JVM tests pass on Scala 3.3.7; all platforms (JVM/JS/Native/WASM) compile cleanly across Scala 3.3.7/2.13.18/2.12.21.
8516e7b to
f07ef42
Compare
Contributor
Author
|
@stephenamar-db This PR is ready now. |
stephenamar-db
approved these changes
May 22, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
Save memory. Each
Val.Strinstance is allocated for every string literal,every
ConstMemberkey, every parser-produced fragment, every concat result.On a config-generation workload (
gen_big_object,realistic1) thebenchmark allocates tens of thousands of
Val.Strper run — so per-instanceoverhead compounds.
Replace the
Val.Str._asciiSafe: Booleanflag with anAsciiSafeStrsubclass. Encoding the invariant in the type instead of a field shrinks every
plain
Val.Str, and lets the renderer/format hot paths dispatch oncase _: AsciiSafeStrinstead of reading a field — the JIT and Scala-NativeCHA can devirtualize this into a direct branch.
This sits directly on
master(commitb252b184) now that #861's C1-C4incremental wins have landed. Single structural commit.
Modification
AsciiSafeStrsubclass marks strings statically known to be ASCIIJSON-safe; plain
Val.Stris the unknown-safety base case.Val.Str.asciiSafe(pos, s)factory constructsAsciiSafeStrdirectly.getAsciiSafe()short-circuits totrueforAsciiSafeStr; for plainVal.Strit still does the lazy SWAR scan and caches._asciiSafe: Booleanfield removed fromVal.Str.Parser, Format) now match on the type.
Existing
string_asciisafe_propagation.jsonnetand full JVM/Native testsuites cover behavior — no semantic change.
Result
Re-benched on 2026-05-21 against
master @ b252b184. Apple Silicon, JDK 21,Scala 3.3.7.
Memory:
Val.Strobject layout (compressed oops)Every plain
Val.Strinstance shrinks from 32 → 24 bytes.AsciiSafeStradds no fields, so it's also 24 bytes per instance — classmetadata is one-time, not per-instance.
Allocation rate (JMH
-prof gc, full bench corpus)The largest absolute saving is the 234 KB/op drop on
gen_big_object— aconfig-shape benchmark. At ~30K
Val.Strinstances/op × 8 bytes each, thearithmetic checks out. Object-key-heavy programs are the win profile;
benches that reuse already-parsed strings (
bench.02,comparison2) seenear-zero deltas because they don't allocate fresh
Val.Strat runtime.No bench shows an allocation regression > +200 B/op (≈ 1 instance worth of
noise).
Wall-clock
Hyperfine, Scala-Native release binary, full bench corpus (warmup=2,
min-runs=5). Long benches (process-startup variance is small relative to
work):
Net wall-clock impact is a wash on Native (sub-10ms benches dominated by
process-startup variance, ±10–20% run-to-run). The
realistic1andbase64DecodeByteswins line up with their alloc-rate drops — less GCpressure → less work.
JMH (JVM steady-state, single 1-iter run):
Summary
Memory is the headline:
Val.Strinstance: 32 → 24 bytes (-25%)gen_big_object),consistently negative on object-construction benches
Wall-clock is neutral-to-marginally-positive — exactly what you'd expect
for a layout shrink: the JIT was already inlining the field read, so
removing it wins on memory pressure rather than instruction count. The
type-encoded invariant also makes it cheaper for future fast paths to
discriminate on the marker without re-scanning.
Test plan
./mill 'sjsonnet.jvm[3.3.7]'.test— green./mill 'sjsonnet.native[3.3.7]'.test— green./mill __.checkFormat— green