Fix thread contention in GenericVersionScheme and WeakInternPool#1937
Conversation
The synchronized blocks introduced by commits 7fa7741 and d9a1b87 create global serial bottlenecks on the two hottest paths during dependency resolution (version parsing and artifact/dependency interning), effectively destroying the parallelism that makes Resolver 2.x twice as fast as 1.x. The underlying thread-safety issue was real: Collections.synchronizedMap does not make computeIfAbsent/compute atomic. However, using a synchronized block on the entire map is too coarse — all threads serialize on a single monitor even for cache hits. GenericVersionScheme: replace synchronizedMap(WeakHashMap) with ConcurrentHashMap, whose computeIfAbsent is lock-striped and allows concurrent access across hash buckets. The version cache does not need weak-reference semantics: version strings are small and bounded per build. WeakInternPool: replace synchronizedMap + synchronized(map) with a plain WeakHashMap guarded by a ReadWriteLock. The read lock allows concurrent cache hits (the common case); only cache misses acquire the exclusive write lock. This preserves weak-reference GC semantics while eliminating the serial bottleneck. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two root causes of the OOM reported against 2.0.19 on a 1000-module project (Maven 3.9 needed <9 GB, 3.10 fails even with 12 GB): 1. conflictIdsSinceRoot HashSet copying: for every child of every node, the full set of conflict IDs from root to the current depth was copied into a new HashSet. On large graphs this creates millions of HashMap.Node objects — the dominant heap consumer visible in the heap dump. Fix: replace with a parent-chain walk for cycle detection. This trades O(depth) per check for zero per-child allocation. Depth is bounded (typically 20-100), while the number of Path objects can be millions. 2. ArrayList.remove(this) in moveOutOfScope(): O(N) scan + shift on potentially large partition lists, called recursively for entire subtrees — explains the deep recursive call stack in the thread dump. Fix: change partition lists from ArrayList to LinkedHashSet, giving O(1) removal while preserving insertion order. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Port the Cache utility from maven-impl (o.a.m.impl.cache.Cache) into maven-resolver-util (o.e.a.util.concurrency.Cache), backported to Java 8 source level. This replaces the ad-hoc synchronization in both GenericVersionScheme and WeakInternPool with a proper ConcurrentHashMap- backed cache that supports configurable reference types (weak/soft/hard) and ReferenceQueue-based cleanup. GenericVersionScheme: now uses Cache.newCache(WEAK) — restoring the original memory-sensitive semantics (entries can be GC'd) while providing lock-free concurrent access via ConcurrentHashMap internals. WeakInternPool: reduced to a thin wrapper around Cache.newCache(WEAK), replacing the ReadWriteLock + WeakHashMap approach. The intern() operation maps directly to cache.computeIfAbsent(key, k -> value). Long-term, this Cache class should be consolidated between maven and maven-resolver to avoid duplication (resolver is the lower-level dependency). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…caches The Cache class (backed by ConcurrentHashMap<RefConcurrentReference, ...>) allocates a WeakReference wrapper object on every lookup just to query the map, plus polls two ReferenceQueues per operation. On hot paths called millions of times during dependency resolution, this per-lookup allocation creates significant GC pressure and contention. Replace with plain ConcurrentHashMap which provides: - Zero allocation on cache hits (lock-free volatile reads) - Lock-striped writes (only on first occurrence of a key) - No expungement overhead Micro-benchmark results (4 threads, 8M ops): - GenericVersionScheme: 1668ms → 126ms (13.2× faster vs master) - DataPool.intern: 916ms → 159ms (5.8× faster vs master) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Revert GenericVersionScheme and WeakInternPool to the exact pre-PR#1902 code: synchronizedMap(WeakHashMap) with get/put (no outer synchronized). The original code was fast because each map operation (get, put) acquires a short-lived lock via synchronizedMap — microsecond hold times with minimal contention. The compound race (two threads both seeing a miss and both creating a value) is benign for both caches. This preserves weak reference semantics (weak keys in GenericVersionScheme, weak keys + weak values in WeakInternPool) for proper GC behavior. Replaces the previous Cache-based approach which had per-lookup overhead: allocating a WeakReference wrapper object on every get() just to query the ConcurrentHashMap, plus polling two ReferenceQueues per operation. Micro-benchmark (4 threads, 8M ops): - GenericVersionScheme: 1668ms (master) → 360ms (4.6× faster) - DataPool.intern: 916ms (master) → 457ms (2.0× faster) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Cache class (copied from maven-impl) is no longer used after reverting GenericVersionScheme and WeakInternPool to the original synchronizedMap(WeakHashMap) implementation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace synchronizedMap(WeakHashMap) with a new ConcurrentWeakCache that provides weak keys (like WeakHashMap) but with lock-free reads (like ConcurrentHashMap). Key optimizations over maven-impl's Cache: - Zero allocation on get() via ThreadLocal reusable lookup key - O(1) stale entry cleanup via identity-based ReferenceQueue removal (no entrySet scan) - Lock-free reads via ConcurrentHashMap volatile read Benchmark results (4 threads, 8M ops/iteration): Version cache: 155ms (51.6M ops/sec) — 10.8x faster than master Intern pool: 176ms (45.5M ops/sec) — 5.2x faster than master Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Use putIfAbsent in both GenericVersionScheme.parseVersion() and WeakInternPool.intern() to guarantee concurrent callers for the same key always receive the same instance. This fixes the testConcurrentCaching assertion that expects identity equality. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move the zero-allocation get() check into putIfAbsent itself so callers don't need the get-then-putIfAbsent pattern. On cache hit (the common case after initial population), no WeakKey or WeakReference is allocated — just a volatile read via ThreadLocal LookupKey. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR addresses performance regressions introduced by coarse-grained synchronization on hot-path caches by introducing a lightweight concurrent weak cache and switching existing caches to use it, while also reducing memory overhead in dependency graph conflict resolution.
Changes:
- Add
ConcurrentWeakCache<K,V>to provide concurrent weak-key/weak-value caching with low read overhead. - Switch
GenericVersionSchemeandDataPool.WeakInternPoolfrom synchronized weak maps toConcurrentWeakCache. - Refactor
PathConflictResolverto avoid per-nodeHashSetcopying and to support O(1) removals viaLinkedHashSet.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| maven-resolver-util/src/main/java/org/eclipse/aether/util/concurrency/ConcurrentWeakCache.java | Introduces a concurrent weak cache used to remove contention from cache hot paths. |
| maven-resolver-util/src/main/java/org/eclipse/aether/util/version/GenericVersionScheme.java | Replaces the synchronized version parsing cache with ConcurrentWeakCache. |
| maven-resolver-impl/src/main/java/org/eclipse/aether/internal/impl/collect/DataPool.java | Reworks WeakInternPool to use ConcurrentWeakCache for lower contention and faster reads. |
| maven-resolver-util/src/main/java/org/eclipse/aether/util/graph/transformer/PathConflictResolver.java | Reduces memory usage by replacing conflict-id set copying with parent-chain cycle checks and using LinkedHashSet partitions. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- ConcurrentWeakCache.putIfAbsent: document the benign race window when a previously cached value has been GC'd and two threads race to replace it. This is an acceptable trade-off vs per-key locking. - PathConflictResolver.addChildren: update Javadoc to describe the parent-chain walk for cycle detection, replacing the stale reference to the removed conflictIdsSinceRoot set. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace the non-atomic putIfAbsent() + put() two-step with a single ConcurrentHashMap.merge() call. This eliminates the race where two threads could both observe a GC'd value and each return their own instance. The merge() lambda atomically decides to keep the existing live value or replace a GC'd one, guaranteeing all concurrent callers for the same key receive the same instance. Only the slow path (cache miss) is affected — the fast path get() via ThreadLocal LookupKey remains lock-free with zero allocation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
EnhancedLocalRepositoryManager.readRepos() re-reads _remote.repositories from disk (with file locking) on every artifact resolution, even for artifacts in the same directory. In a primed build, this causes thousands of redundant file reads with synchronized blocks and FileLock acquisition. Add a ConcurrentHashMap<Path, Properties> cache keyed by tracking file path. Multiple artifacts in the same directory (e.g. jar + pom) share the same tracking file, so the cache hit rate is high. The cache is invalidated on writes via addRepo(). Wall-clock profiling shows 49-55% reduction in readRepos/ LegacyTrackingFileManager.read samples, and complete elimination of the mutex() and Files.isRegularFile() hotspots in checkFind(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two concurrent addRepo() calls for the same tracking file could reorder their cache.put() calls, leaving stale data in the cache. Fix by invalidating (remove) instead of updating — the next readRepos() re-reads from disk. Also document that cached Properties are shared across threads (must be treated as read-only) and that cross-process staleness is bounded to at most a redundant download, not data corruption. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
getLockAndRefTrack() used ConcurrentHashMap.compute() on every call, which takes a per-bucket exclusive lock even when the holder already exists. In the common case (lock exists, just increment refcount), this serialized all threads hashing to the same bucket. Add a lock-free fast path: ConcurrentHashMap.get() (volatile read, no locking) + tryIncRef() (CAS loop on AtomicInteger). Only fall back to compute() when the holder is absent or being closed. The close/acquire race is handled by a CAS sentinel: closeLock() marks the holder as closed (CAS refcount 0 → MIN_VALUE) before destroying it. tryIncRef() rejects refcount ≤ 0, so it cannot revive a destroyed holder. If tryIncRef CAS(0→1) races with closeLock's CAS(0→MIN_VALUE), exactly one succeeds — both outcomes are correct. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
InhibitingNameMapper.nameLocks(): replace nested streams with imperative loops and short-circuit when lockingInhibitors is empty. The stream version created stream objects on every call even with no inhibitors — the common case. FileLockNamedLockFactory: keep FileChannels open for reuse instead of closing on every destroyLock(). Opening a FileChannel is a syscall (open/creat) that shows up as a hotspot when locks are acquired and released frequently. The fileChannels ConcurrentMap already supports reuse via computeIfAbsent in createLock() — we just need to stop removing channels in destroyLock(). Channels are now closed on factory shutdown via doShutdown(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
gnodet
left a comment
There was a problem hiding this comment.
Concurrency & Data Lifecycle Review
I reviewed every changed file with a focus on (1) correctness when multiple builds run in parallel (separate JVMs sharing the same local repository), and (2) data lifecycle consistency within a single build. Here are the findings.
Scoping context
| Component | Scope | Shared across sessions? |
|---|---|---|
GenericVersionScheme |
One per RepositorySystem |
Yes — all sessions share the same versionCache |
EnhancedLocalRepositoryManager |
One per session (factory creates fresh) | No — each session gets its own trackingFileCache |
DataPool.WeakInternPool |
Stored in RepositoryCache |
Yes if sessions share a RepositoryCache |
FileLockNamedLockFactory |
@Singleton per DI container |
Yes — fileChannels map shared across sessions |
NamedLockFactorySupport |
@Singleton (inherited) |
Yes — locks map shared across sessions |
PathConflictResolver.State |
Per transformGraph() call, single-threaded |
No |
Finding 1 — FileLockNamedLockFactory: DELETE_ON_CLOSE + channel reuse
Severity: Medium (not a regression, but changes observable behavior)
On non-Windows (the default), DELETE_LOCK_FILES=true and channels are opened with DELETE_ON_CLOSE. On Unix, the JDK's UnixChannelFactory.open() calls unlink(path) immediately after opening the file — the fd keeps the inode alive but the path disappears from the filesystem.
Old behavior (per-cycle): createLock → open(+unlink) → use fd → destroyLock → close fd (inode freed) → next createLock → open creates new file(+unlink) → new inode.
New behavior (reuse): createLock → open(+unlink) → use fd → destroyLock (no-op) → next createLock → computeIfAbsent returns existing channel → same fd, same inode. File never re-appears on disk.
For cross-process locking, both behaviors are equally broken with DELETE_ON_CLOSE on Unix — the lock file is unlinked immediately, so another process opening the same path creates a different inode and they lock on different files. This is pre-existing.
Practical concern: with the old code, a closed channel was removed from fileChannels, so the next createLock would open() with retry. With the new code, if the cached channel ever becomes invalid (e.g., NFS hiccup, disk error), all subsequent lock operations fail with no recovery path — computeIfAbsent keeps returning the stale channel.
Recommendation: Consider either (a) handling ClosedChannelException in FileLockNamedLock.obtainFileLock() by evicting the stale channel and retrying, or (b) documenting that the reuse assumes the underlying fd remains valid for the factory's lifetime.
Finding 2 — EnhancedLocalRepositoryManager: Stale cache entry after addRepo race
Severity: Low (documented, benign consequence)
Thread A: addRepo() → remove(trackingPath) → [gap] → update(trackingPath)
Thread B: readRepos() → computeIfAbsent reads OLD data → caches stale entry
Between A's remove() and A's update(), thread B can read old data from disk and cache it. The stale entry persists until the next addRepo() invalidation or session end. Consequence: a redundant (but harmless) download.
Already correctly documented in the Javadoc. The invalidate-before-write ordering is correct (invalidate-after-write would be worse).
Finding 3 — EMPTY_PROPERTIES is a shared mutable singleton
Severity: Low (currently safe, fragile)
private static final Properties EMPTY_PROPERTIES = new Properties();Properties extends Hashtable — fully mutable. All current callers (checkFind, isTracked) only call .get() and .keySet() (read-only). But nothing enforces this at the type level. If future code calls .put() on the cached Properties returned by readRepos(), it corrupts the shared singleton AND every trackingFileCache entry referencing it.
Recommendation: Consider adding a prominent @implNote warning, or wrapping readRepos() returns in an unmodifiable view for defense-in-depth.
Finding 4 — ConcurrentWeakCache.putIfAbsent: GC race on return path
Severity: Low (inherent to weak references, not a regression)
After merge() returns the winning existingRef, GC can collect the existing value before the subsequent ref.get():
WeakReference<V> ref = map.merge(...);
if (ref != newRef) {
V existingValue = ref.get(); // ← GC window here
if (existingValue != null) return existingValue;
}
return value; // 'value' is NOT in the mapTwo concurrent callers for the same key could briefly hold different instances, weakening the interning guarantee. The old synchronized(map) { map.compute() } had a tighter window because the synchronized block held a strong reference preventing GC during the return.
For GenericVersionScheme, this is harmless (value-equal GenericVersion instances). For DataPool intern pools, it's merely suboptimal (less deduplication). Inherent to weak references — not a bug.
Finding 5 — NamedLockFactorySupport: Lock-free fast path is correct ✅
Traced all race scenarios between getLockAndRefTrack() (fast path) and closeLock():
| Scenario | Fast path | closeLock | Correct? |
|---|---|---|---|
tryIncRef(1→2) before decRef(2→1) |
Returns lock | Holder stays | ✅ |
decRef(1→0) before tryIncRef |
rejects 0 (≤0) | CAS(0→MIN_VALUE), destroys | ✅ |
tryIncRef races with CAS sentinel |
rejects MIN_VALUE (<0) | Lock destroyed | ✅ |
tryIncRef(1→2) then decRef(2→1) |
Lock acquired | refcount>0, holder kept | ✅ |
Key correctness properties:
tryIncRef()rejects both 0 and negative values (current <= 0)compute()serializes per-bucket — fast path and slow path can't create duplicate holders- CAS sentinel (0→MIN_VALUE) is defense-in-depth, making "closed" permanent
Finding 6 — PathConflictResolver: Parent-chain walk is semantically equivalent ✅
The old HashSet<String> copy per child and the new hasConflictIdOnPathToRoot() walk check the same predicate. Objects.equals() correctly handles null conflictIds. LinkedHashSet for partitions gives O(1) removal vs O(N) with ArrayList, with no semantic change (identity-based Path elements are always unique). Purely single-threaded — no concurrency concerns.
Finding 7 — InhibitingNameMapper: Safe optimization ✅
lockingInhibitors is set once in the constructor and never modified. The isEmpty() short-circuit and stream→loop conversion are semantically identical. No shared mutable state.
Overall Assessment
The changes are concurrency-safe for the single-build case (one JVM, multiple threads). For parallel builds (separate JVMs sharing the same local repo):
- The tracking file cache is per-session-instance → separate JVMs get independent caches, no interaction
- The
@Singletonlock factory is per-JVM → each JVM has its ownlocksmap, correct - Cross-process file locking with
DELETE_ON_CLOSEon Unix was already unreliable pre-PR → no regression
The only actionable item is Finding 1 (stale FileChannel recovery). The rest are either informational, documented, or inherent to the weak-reference design.
FileLockNamedLockFactory: add stale channel recovery — if a cached FileChannel becomes invalid (closed externally, I/O error, NFS hiccup), createLock() now detects the !isOpen() state and atomically evicts the stale entry before opening a fresh channel. The isOpen() check is a single volatile read on the fast path (zero overhead when healthy). Also extract channel-opening into openFileChannel() for clarity. EnhancedLocalRepositoryManager: make EMPTY_PROPERTIES truly read-only by overriding put/remove/clear to throw UnsupportedOperationException. The singleton is shared across threads via trackingFileCache; preventing accidental mutation guards against subtle corruption if future callers ever modify the returned Properties. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@cstamas Please assign appropriate label to PR according to the type of change. |
| * | ||
| * @param <K> the type of keys | ||
| * @param <V> the type of values | ||
| * @since 2.0.19 |
There was a problem hiding this comment.
nit: this seems to be not correct, shouldn't it be 2.0.20?
Problem
PR #1902 introduced
synchronized(versionCache)andsynchronized(this.map)blocks around compound cache operations to fix a thread-safety issue withcomputeIfAbsent()onsynchronizedMap(WeakHashMap). While correct, this serialized all threads on hot paths, causing a 2x build time regression for Quarkus (~6min → ~12min).Root cause analysis
Collections.synchronizedMap(new WeakHashMap<>())wraps each individual method (get,put) insynchronized(mutex), but does NOT make compound operations likecomputeIfAbsent()atomic. PR #1902 fixed this by adding outersynchronizedblocks — correct but serializing.The original pre-#1902 code used separate
get()+put()calls where each individually acquires/releases the lock for microseconds. The compound race is benign for a cache — at worst a duplicate value is created and one wins the put.Solution
1.
ConcurrentWeakCache— new lightweight concurrent cacheIntroduces
ConcurrentWeakCache<K,V>— a stripped-down version of maven-impl'sCache, optimized for hot-path performance:ConcurrentHashMap.get()is a volatile read, no lock acquisitionget()— uses aThreadLocalreusable lookup key instead of allocating a new wrapper per read (the main perf issue with the fullCacheclass)ReferenceQueueinstead ofentrySet().removeIf()full-map scanWeakHashMap)WeakReference2.
PathConflictResolvermemory explosion fixThe
PathConflictResolverhad an O(n²) memory issue from copyingHashSet<Object> conflictIdsSinceRootat every graph node. Replaced with:hasConflictIdOnPathToRoot()) — O(depth) per check, no copyingLinkedHashSetpartitions — O(1) removal instead of O(n)ArrayList.remove()3. Tracking file read cache
EnhancedLocalRepositoryManager.readRepos()re-reads_remote.repositoriestracking files from disk (with file locking) on every artifact resolution, even for artifacts in the same directory. In a primed build, this causes thousands of redundant file reads with synchronized blocks and FileLock acquisition.Added a
ConcurrentHashMap<Path, Properties>cache keyed by tracking file path. Multiple artifacts in the same directory (e.g. jar + pom) share the same tracking file, so the cache hit rate is high. The cache is invalidated (not updated) on writes viaaddRepo()to avoid a race where two concurrent writes could reorder their cache puts.4. Lock-free fast path for
NamedLockFactorySupportgetLockAndRefTrack()usedConcurrentHashMap.compute()on every call, which takes a per-bucket exclusive lock even when the holder already exists. In the common case (lock exists, just increment refcount), this serialized all threads hashing to the same bucket.Added a lock-free fast path:
ConcurrentHashMap.get()(volatile read) +tryIncRef()(CAS loop onAtomicInteger). Only falls back tocompute()when the holder is absent or being closed.The close/acquire race is handled by a CAS sentinel:
closeLock()marks the holder as closed (CAS refcount0 → MIN_VALUE) before destroying it.tryIncRef()rejectsrefcount ≤ 0, preventing revival of a destroyed lock.5.
InhibitingNameMapperstream→loop conversionReplaced
Stream.filter().collect()with imperative loops and added empty-list short-circuit to skip the filtering entirely when noLockingInhibitors are registered (the common case).6.
FileLockNamedLockFactoryFileChannel reusedestroyLock()was closing theFileChannelon every lock release, forcing a newopen()syscall on every lock acquisition. Since the lock factory is session-scoped, the channels can be kept open for reuse across lock acquire/release cycles. Channels are now closed only during factoryshutdown().Profiling results (async-profiler wall-clock, 5ms interval)
End-to-end build time (Quarkus primed build, 8-thread)
Method-level sample comparison (baseline → all patches)
EnhancedLocalRepositoryManager.find()FileLockNamedLockFactory.createLock()readRepos()LegacyTrackingFileManager.read()Retry.retry()InhibitingNameMapper.nameLocks()getLockAndRefTrack()destroyLock()mutex()(String.intern lock)ConcurrentWeakCache microbenchmark (4 threads, 8M ops/iteration)
10.8x faster than master for version parsing, 5.2x faster for artifact interning, while preserving weak reference semantics for GC-friendly memory behavior.
Files changed
ConcurrentWeakCache.java— new lightweight concurrent cache with weak keys/valuesGenericVersionScheme.java— useConcurrentWeakCachefor version parsing cacheDataPool.java— useConcurrentWeakCacheforWeakInternPoolPathConflictResolver.java— fix O(n²) memory from HashSet copyingEnhancedLocalRepositoryManager.java— cache tracking file reads to eliminate redundant disk I/ONamedLockFactorySupport.java— lock-free fast path for ref-counted lock acquisitionInhibitingNameMapper.java— stream→loop conversion with empty-list short-circuitFileLockNamedLockFactory.java— FileChannel reuse across lock lifecycle