Stretto

A high performance thread-safe memory-bound Rust cache.

Stretto is a pure Rust implementation of https://github.com/dgraph-io/ristretto.

Features

Internal Mutability - Do not need to use Arc<RwLock<Cache<...>> for concurrent code, you just need Cache<...> or AsyncCache<...>
Sync and Async - Stretto support sync and runtime agnostic async.
- In sync, Cache starts two extra OS level threads. One is policy thread, the other is writing thread.
- In async, AsyncCache starts two extra green threads. One is policy thread, the other is writing thread.
Store policy Stretto only store the value, which means the cache does not store the key.
High Hit Ratios - with Dgraph's developers unique admission/eviction policy pairing, Ristretto's performance is best in class.
- Eviction: SampledLFU - on par with exact LRU and better performance on Search and Database traces.
- Admission: TinyLFU - extra performance with little memory overhead (12 bits per counter).
Designed for Database Workloads - on OLTP-style traces with small working sets and strong frequency skew, Stretto keeps the hot set far better than general-purpose caches (see Benchmarks).
Fast Throughput - use a variety of techniques for managing contention and the result is excellent throughput.
Cost-Based Eviction - any large new item deemed valuable can evict multiple smaller items (cost could be anything).
Fully Concurrent - you can use as many threads as you want with little throughput degradation.
Metrics - optional performance metrics for throughput, hit ratios, and other stats.
Simple API - just figure out your ideal CacheBuilder/AsyncCacheBuilder values and you're off and running.

Benchmarks

Hit ratios produced by cachebench against the ARC trace suite, 16 concurrent clients. Compared against QuickCache 0.6, TinyUFO 0.8 and Moka 0.12. Higher is better; bold marks the leader for each row. "Stretto" and "Stretto Async" report the sync Cache and async AsyncCache respectively — both run the same TinyLFU policy and 64-stripe insert buffer (see Sync vs async for a side-by-side discussion).

Where Stretto fits. TinyLFU admission compares each candidate's recent access frequency against the current eviction victim's and only admits when the candidate is "hotter." Two conditions together make this win:

Capacity is small relative to the working set (typically ≤ ~25%). The admission filter then has a real choice on every insert; rejecting a one-off scan beats evicting a hot tail entry.
Access frequencies are skewed (Zipf-like). The frequency sketch needs signal to distinguish hot from cold; uniform or scan-heavy traffic gives it none.

When both hold, Stretto leads by 7–47 percentage points — OLTP at every capacity, the 20K rows of P1–P13, the 100K rows of S1–S3. When either fails (DS1 at 4M+, S1/S2 at 800K, P14, ConCat, MergeP, MergeS, the 160K rows of P4/P6/P7/P11/P12), admission rejects items LRU/W-TinyLFU would have kept and Stretto trails by up to 57 points. Decision rule: if your working set is several times your cache size and hot keys repeat, pick Stretto; if traffic is bursty scans or your cache is sized to hold most of the data, pick Moka or QuickCache.

Read-heavy ≠ Stretto-friendly. The fit axis is access pattern, not the read-to-write ratio. A read-heavy workload with a tight hot set and skewed frequencies (OLTP, P1–P13) is Stretto's sweet spot; a read-heavy workload with a wide, churning keyspace (DS1) is its worst case — admission rejects items that LRU/W-TinyLFU would have kept, regardless of how few writes there are. Sized your cache to hold most of the data? Use an LRU-style cache (Moka, QuickCache).

S3 trace (Search)

S3 traces an entire workload from small to large capacity in one section. At 100K (cold tail still being rejected) Stretto wins by ~8 points; at 400K (Stretto's "frontier") it edges past Moka Segmented by 1.6; at 800K (capacity ≈ working set) admission filtering hurts and Moka Segmented pulls ahead by 13.6. The 400K row is the cleanest single illustration of where Stretto's regime ends.

Capacity	QuickCache	Stretto	Stretto Async	TinyUFO	Moka Sync	Moka Async	Moka Segmented(8)
100,000	12.83%	20.72%	20.90%	2.80%	10.40%	10.49%	10.08%
400,000	42.29%	43.92%	43.21%	20.85%	40.98%	41.59%	42.30%
800,000	68.35%	56.52%	58.50%	62.28%	66.35%	67.37%	70.10%

DS1 trace (Database server)

DS1 is a "database" trace in name only — its access pattern has weak frequency skew across a huge key space and rewards LRU-style retention. Even at 1M (≈5% of interesting working set) Stretto trails QuickCache by 2 points; the gap widens to 20 points at 4M and 45 points at 8M, as LRU policies keep recently-touched items that admission filtering rejects. DS1 is the canonical "do not pick Stretto" workload.

Capacity	QuickCache	Stretto	Stretto Async	TinyUFO	Moka Sync	Moka Async	Moka Segmented(8)
1,000,000	15.09%	12.98%	14.83%	3.36%	12.51%	13.19%	14.74%
4,000,000	44.42%	24.92%	26.97%	24.71%	41.48%	39.22%	44.65%
8,000,000	69.31%	24.57%	34.64%	53.42%	64.38%	64.85%	66.43%

OLTP trace

OLTP is the workload Ristretto was designed for: a tight, repeatedly-accessed working set with strong frequency skew. Stretto's hit ratio is 33–47 percentage points above the next-best cache and stays nearly flat across capacity (75.7% at cap=256, 76.4% at cap=2000) — TinyLFU pins the hot tail regardless of slack. Other caches' ratios climb with capacity (QuickCache: 22% → 42%, Moka Sync: 26% → 43%) because they have to physically retain the working set first. v0.9.0's 64-stripe insert buffer keeps producer contention out of the picture under 16 concurrent clients, so the result reflects the policy itself, not buffer-overflow drops.

Capacity	QuickCache	Stretto	Stretto Async	TinyUFO	Moka Sync	Moka Async	Moka Segmented(8)
256	22.28%	75.74%	75.78%	16.35%	26.40%	25.55%	28.80%
512	28.79%	76.02%	75.66%	19.89%	32.71%	32.00%	33.53%
1,000	35.12%	76.13%	75.87%	27.08%	38.02%	37.48%	37.80%
2,000	41.68%	76.39%	76.15%	35.05%	42.99%	42.47%	42.85%

ARC P-series traces (Workstation)

The P-series captures workstation block-IO over a few hours per machine. Two distinct regimes:

At 20K capacity (~12.5% of working set) — Stretto wins all 13 of P1–P13, by 13–40 points. P14 (a ~5×-larger trace, 80K is already a tighter ratio) sits past Stretto's regime and QuickCache leads.
At 160K capacity (~100% of working set) — the picture flattens. Stretto still wins clearly on P2 (+6), P3 (+5), P8 (+9), P10 (+7); ties P1, P5, P9, P13 within ~1 point; trails P4, P6, P7, P11, P12, P14 where W-TinyLFU's recency-and-frequency signal extracts more from a saturated cache than admission filtering can.

Trace	Capacity	QuickCache	Stretto	Stretto Async	TinyUFO	Moka Sync	Moka Async	Moka Segmented(8)
P1	20,000	22.70%	54.97%	54.61%	17.39%	21.37%	21.25%	21.86%
P1	160,000	68.61%	68.94%	69.41%	64.30%	63.98%	64.84%	66.96%
P2	20,000	21.02%	60.94%	60.75%	16.20%	19.39%	19.44%	20.16%
P2	160,000	67.15%	73.48%	73.55%	63.91%	63.38%	63.85%	66.83%
P3	20,000	8.92%	44.17%	44.82%	3.82%	11.22%	11.25%	11.02%
P3	160,000	54.16%	59.44%	63.39%	50.02%	48.23%	48.89%	50.91%
P4	20,000	4.79%	18.61%	18.51%	4.52%	4.79%	4.74%	4.66%
P4	160,000	30.62%	22.46%	22.60%	26.98%	28.81%	29.04%	30.71%
P5	20,000	8.98%	29.91%	30.27%	6.26%	8.55%	8.53%	8.38%
P5	160,000	46.58%	47.39%	47.08%	38.57%	43.14%	43.28%	45.16%
P6	20,000	17.22%	37.05%	37.48%	8.89%	16.29%	15.89%	14.89%
P6	160,000	84.41%	64.87%	44.88%	58.88%	77.27%	78.01%	81.28%
P7	20,000	10.46%	43.19%	41.85%	5.12%	8.22%	8.10%	7.46%
P7	160,000	57.35%	53.52%	54.12%	44.31%	53.20%	54.30%	56.63%
P8	20,000	22.64%	62.46%	61.90%	18.43%	21.38%	21.63%	21.73%
P8	160,000	76.54%	85.26%	83.24%	73.72%	72.72%	73.19%	76.03%
P9	20,000	12.49%	40.13%	39.63%	9.88%	13.63%	13.65%	13.87%
P9	160,000	55.31%	56.59%	56.10%	51.33%	50.28%	51.42%	53.15%
P10	20,000	6.56%	19.93%	19.77%	3.36%	4.58%	4.61%	4.53%
P10	160,000	28.72%	37.08%	35.26%	21.48%	30.38%	29.57%	29.99%
P11	20,000	19.53%	42.40%	44.59%	17.57%	18.81%	18.69%	18.91%
P11	160,000	68.28%	56.77%	57.73%	66.50%	65.71%	65.91%	68.26%
P12	20,000	10.09%	32.71%	32.38%	8.96%	9.23%	9.26%	9.11%
P12	160,000	45.06%	41.79%	41.53%	40.61%	41.94%	42.58%	44.66%
P13	20,000	12.60%	43.26%	42.98%	7.32%	11.36%	11.16%	10.99%
P13	160,000	55.44%	56.74%	57.17%	47.44%	50.05%	50.76%	53.95%
P14	80,000	29.80%	24.99%	24.76%	19.53%	28.13%	28.22%	29.11%
P14	640,000	57.93%	33.05%	33.99%	53.13%	51.93%	52.15%	54.30%

ARC S-series traces (Search)

S1 and S2 are search-engine traces with weak frequency skew. The crossover between Stretto's regime and LRU-style policies' regime sits between 100K and 800K capacity:

At 100K (working set still doesn't fit), Stretto's filter rejects the long cold tail and wins by 8–9 points.
At 800K (capacity nears the working set), LRU/W-TinyLFU keep recently-touched items that admission rejects, and Stretto trails by 14–27 points.

Trace	Capacity	QuickCache	Stretto	Stretto Async	TinyUFO	Moka Sync	Moka Async	Moka Segmented(8)
S1	100,000	9.59%	18.78%	18.96%	2.64%	8.88%	8.89%	8.69%
S1	800,000	56.86%	29.81%	30.62%	49.44%	55.32%	55.38%	56.08%
S2	100,000	13.11%	21.24%	21.27%	2.78%	10.53%	10.61%	10.24%
S2	800,000	68.36%	56.55%	62.35%	62.57%	66.78%	67.67%	70.33%

Merged ARC traces (scan-heavy)

ConCat (DS1 ∥ S3), MergeP (P1–P14 interleaved), and MergeS (S1–S3 interleaved) splice ARC traces into long sequences with weak frequency skew over very large key spaces. TinyLFU's frequency sketch is built for a single stable workload, not concatenated phases, so admission rejects items LRU-style policies would retain — Stretto trails by 30–60 points across every measured capacity. These traces document the regime where Stretto is the wrong tool; do not use them to size a Stretto cache.

ConCat (DS1 ∥ S3):

Capacity	QuickCache	Stretto	Stretto Async	TinyUFO	Moka Sync	Moka Async	Moka Segmented(8)
200,000	60.43%	14.31%	14.55%	40.81%	54.80%	55.38%	57.86%
400,000	72.44%	15.31%	16.61%	57.37%	63.41%	64.51%	67.96%
3,200,000	87.74%	43.26%	42.40%	86.87%	73.09%	76.99%	82.22%

MergeP (P1–P14 interleaved):

Capacity	QuickCache	Stretto	Stretto Async	TinyUFO	Moka Sync	Moka Async	Moka Segmented(8)
400,000	44.31%	10.71%	10.54%	38.40%	38.09%	38.48%	40.31%
1,000,000	57.98%	22.83%	20.23%	53.41%	50.66%	51.46%	53.77%
3,200,000	75.23%	39.62%	39.40%	71.23%	65.02%	66.43%	70.42%

MergeS (S1–S3 interleaved):

Capacity	QuickCache	Stretto	Stretto Async	TinyUFO	Moka Sync	Moka Async	Moka Segmented(8)
400,000	19.96%	14.67%	14.96%	7.29%	20.62%	20.68%	21.14%
1,000,000	44.81%	28.25%	29.26%	30.15%	41.67%	42.01%	44.19%
3,200,000	83.02%	45.29%	51.21%	82.03%	77.54%	78.38%	80.38%

Sync vs async cache

AsyncCache since v0.9.0 uses the same TinyLFU policy and the same 64-stripe insert buffer as sync Cache. The full breakdown is in the per-trace tables above; the summary:

On workloads matching Stretto's regime (OLTP, P1–P5, P8–P10, P12, P13, S1/S2 at 100K, S3 ≤ 400K), sync and async hit ratios track within ~1 percentage point — the underlying admission decisions are the same.
On scan-heavy or burst-heavy traces where the producer-side insert channel saturates, drop-on-overflow timing differs between sync (crossbeam_channel::bounded) and async (async_channel over std::thread) and the two diverge by 5–20 points. The sign isn't fixed: async wins some, sync wins others.

Trace, Capacity	Sync	Async	Δ
P6, 160,000	64.87%	44.88%	−19.99
DS1, 8,000,000	24.57%	34.64%	+10.06
S2, 800,000	56.55%	62.35%	+5.80
MergeS, 3,200,000	45.29%	51.21%	+5.92
P3, 160,000	59.44%	63.39%	+3.95

These swings are timing-dependent, not policy divergence: the affected traces sit outside Stretto's sweet spot (capacity ≈ working set, or scan-heavy phases) where hit ratio isn't the deciding metric anyway. Throughput is comparable: at S3 cap=400K sync finishes in ~2.9s vs async ~4.2s; at OLTP cap=256 both complete in under 0.07s. Pick AsyncCache for runtime ergonomics — hit ratio is not the tradeoff for workloads matching Stretto's regime.

Reproducing

git clone --recursive https://github.com/al8n/cachebench.git
cd cachebench/cache-trace/arc && for f in *.lis.zst; do zstd -d "$f"; done && cd ../..
cargo run --release --features "stretto,quick_cache,tiny-ufo,moka-v012" -- \
    -f oltp,p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,p11,p12,p13,p14,s1,s2,s3,ds1,concat,merge-p,merge-s -n 16

spc1likeread is omitted from the tables above; with a 30,000,000-entry default capacity it takes ≈30 minutes per cache implementation and adds little signal beyond what DS1 already shows.

Installation

Use Cache.
```
[dependencies]
stretto = "0.9"
```

Use AsyncCache with tokio

[dependencies]
stretto = { version = "0.9", features = ["tokio"] }

Use AsyncCache with smol

[dependencies]
stretto = { version = "0.9", features = ["smol"] }

Usage

Example

See examples folder for more details:

sync: Use stretto's cache in sync environment
tokio: Use stretto's cache with tokio async runtime
smol: Use stretto's cache with smol async runtime

Config

The CacheBuilder struct is used when creating Cache instances if you want to customize the Cache settings.

num_counters

num_counters is the number of 4-bit access counters to keep for admission and eviction. Dgraph's developers have seen good performance in setting this to 10x the number of items you expect to keep in the cache when full.

For example, if you expect each item to have a cost of 1 and max_cost is 100, set num_counters to 1,000. Or, if you use variable cost values but expect the cache to hold around 10,000 items when full, set num_counters to 100,000. The important thing is the number of unique items in the full cache, not necessarily the max_cost value.

max_cost

max_cost is how eviction decisions are made. For example, if max_cost is 100 and a new item with a cost of 1 increases total cache cost to 101, 1 item will be evicted.

max_cost can also be used to denote the max size in bytes. For example, if max_cost is 1,000,000 (1MB) and the cache is full with 1,000 1KB items, a new item (that's accepted) would cause 5 1KB items to be evicted.

max_cost could be anything as long as it matches how you're using the cost values when calling insert.

key_builder

pub trait KeyBuilder {
    type Key: Hash + Eq + ?Sized;

    /// hash_index is used to hash the key to u64
    fn hash_index<Q>(&self, key: &Q) -> u64
        where 
            Self::Key: core::borrow::Borrow<Q>,
            Q: Hash + Eq + ?Sized;

    /// if you want a 128bit hashes, you should implement this method,
    /// or leave this method return 0
    fn hash_conflict<Q>(&self, key: &Q) -> u64
        where 
            Self::Key: core::borrow::Borrow<Q>,
            Q: Hash + Eq + ?Sized;
    { 0 }

    /// build the key to 128bit hashes.
    fn build_key<Q>(&self, k: &Q) -> (u64, u64) 
        where 
            Self::Key: core::borrow::Borrow<Q>,
            Q: Hash + Eq + ?Sized;
    {
        (self.hash_index(k), self.hash_conflict(k))
    }
}

KeyBuilder is the hashing algorithm used for every key. In Stretto, the Cache will never store the real key. The key will be processed by KeyBuilder. Stretto has two default built-in key builder, one is TransparentKeyBuilder, the other is DefaultKeyBuilder. If your key implements TransparentKey trait, you can use TransparentKeyBuilder which is faster than DefaultKeyBuilder. Otherwise, you should use DefaultKeyBuilder You can also write your own key builder for the Cache, by implementing KeyBuilder trait.

Note that if you want 128bit hashes you should use the full (u64, u64), otherwise just fill the u64 at the 0 position, and it will behave like any 64bit hash.

buffer_size

Both Cache and AsyncCache buffer admission requests in a 64-stripe ring that hashes each thread (or task) to its own Mutex<Vec<_>> so producers rarely contend. When a stripe accumulates insert_stripe_high_water items (default 64) the full batch is sent to the policy processor; the processor also drains every stripe inline every drain_interval (default 500ms) as a safety net.

Use set_insert_stripe_high_water(items) to tune the per-stripe batch size. Larger values amortize channel sends but delay admission decisions; values around 64–256 work well for caches under ~10K capacity. The stripe count and per-batch channel capacity are not user-tunable.

metrics

Metrics is true when you want real-time logging of a variety of stats. The reason this is a CacheBuilder flag is because there's a 10% throughput performance overhead.

ignore_internal_cost

Defaults to true: each insert is charged only the caller-supplied cost, so max_cost behaves as an entry budget when you pass 1 per insert.

Set to false when max_cost represents a byte budget and you need each stored item to also account for ~56 bytes of per-entry bookkeeping (key, conflict, version, value wrapper, time).

cleanup_duration

The Cache will cleanup the expired values every 500ms by default.

update_validator

pub trait UpdateValidator: Send + Sync + 'static {
    type Value: Send + Sync + 'static;

    /// should_update is called when a value already exists in cache and is being updated.
    fn should_update(&self, prev: &Self::Value, curr: &Self::Value) -> bool;
}

By default, the Cache will always update the value if the value already exists in the cache, this trait is for you to check if the value should be updated.

callback

pub trait CacheCallback: Send + Sync + 'static {
    type Value: Send + Sync + 'static;

    /// on_exit is called whenever a value is removed from cache. This can be
    /// used to do manual memory deallocation. Would also be called on eviction
    /// and rejection of the value.
    fn on_exit(&self, val: Option<Self::Value>);

    /// on_evict is called for every eviction and passes the hashed key, value,
    /// and cost to the function.
    fn on_evict(&self, item: Item<Self::Value>) {
        self.on_exit(item.val)
    }

    /// on_reject is called for every rejection done via the policy.
    fn on_reject(&self, item: Item<Self::Value>) {
        self.on_exit(item.val)
    }
}

CacheCallback is for customize some extra operations on values when related event happens.

coster

pub trait Coster: Send + Sync + 'static {
    type Value: Send + Sync + 'static;

    /// cost evaluates a value and outputs a corresponding cost. This function
    /// is ran after insert is called for a new item or an item update with a cost
    /// param of 0.
    fn cost(&self, val: &Self::Value) -> i64;
}

Cost is a trait you can pass to the CacheBuilder in order to evaluate item cost at runtime, and only for the insert calls that aren't dropped (this is useful if calculating item cost is particularly expensive, and you don't want to waste time on items that will be dropped anyways).

To signal to Stretto that you'd like to use this Coster trait:

Set the Coster field to your own Coster implementation.
When calling insert for new items or item updates, use a cost of 0.

hasher

The hasher for the Cache, default is SeaHasher.

Acknowledgements

Thanks Dgraph's developers for providing amazing Go version Ristretto implementation.

License

^{Licensed under either of Apache License, Version
2.0 or MIT license at your option.}
_{Unless you explicitly state otherwise, any contribution intentionally submitted
for inclusion in this project by you, as defined in the Apache-2.0 license,
shall be dual licensed as above, without any additional terms or conditions.}

Name		Name	Last commit message	Last commit date
Latest commit History 164 Commits
.github		.github
examples		examples
src		src
.codecov.yml		.codecov.yml
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
build.rs		build.rs
rustfmt.toml		rustfmt.toml

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Stretto

Features

Table of Contents

Benchmarks

S3 trace (Search)

DS1 trace (Database server)

OLTP trace

ARC P-series traces (Workstation)

ARC S-series traces (Search)

Merged ARC traces (scan-heavy)

Sync vs async cache

Reproducing

Installation

Usage

Example

Config

num_counters

max_cost

key_builder

buffer_size

metrics

ignore_internal_cost

cleanup_duration

update_validator

callback

coster

hasher

Acknowledgements

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 2

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages