Skip to content

Conversation

@paulorsousa
Copy link
Collaborator

@paulorsousa paulorsousa commented Dec 5, 2025

This PR addresses two issues in --random-data generation:

  1. value buffer mutation wrap-around caused repeated patterns.
    The mutation index is designed to avoid generating a completely new random value on every request by mutating an existing buffer.
    However, once the index wrapped, the mutation pattern repeated, significantly reducing randomness.

  2. Initial value buffer was not seeded per client, even when --distinct-client-seed was enabled.
    As a result, multiple clients generated identical initial buffers, again reducing randomness across workloads.

These issues reduced entropy and could distort benchmark results or the realism of workload simulations.

What this PR changes

Random value buffers are now:

  1. Fully regenerated when the mutation index wraps, eliminating cyclic patterns.
  2. Initialised only after per-client seed setup, ensuring each client receives its own independent initial buffer.

Impact

Test: 100% SETs, 36 bytes random values, 50 clients (each with a different seed), 10 seconds test time.

Scenario Values Repetitions per Value
Before 1.2M ~150x each
After (this PR) 1.2M 1x each

How to reproduce and measure

  1. Start redis:
    redis-server --save "" --io-threads 4
  2. Monitor commands:
    redis-cli monitor > executed-cmds.log
  3. Start memtier:
    ./memtier_benchmark --ratio=1:0 --data-size=36 --random-data --clients=50 --distinct-client-seed --threads=1 --test-time 10 --hide-histogram
  4. Stop redis-cli monitor with ctrl + c (or equivalent)
  5. Count repetitions:
    sed -n 's/.* "\(.*\)"$/\1/p' executed-cmds.log | sort | uniq -c | sort -nr | less

Impact on performance

No significant performance degradation was observed.

The previous implementation only incremented a single byte position
in the value buffer, causing values to repeat after cycling through
all bytes in all buffer positions.
This change regenerates completely new random data when the mutation position wraps around, giving more
guarantees of random values throughout the benchmark run while
preventing hurting performance too much.
…ation

- Extract buffer filling logic into new fill_value_buffer() method
- Remove automatic buffer filling from alloc_value_buffer()
- Call fill_value_buffer() explicitly after set_random_seed() in client setup
- Simplify random data generation to use gaussian_noise::get_random()
- Remove /dev/urandom file descriptor (m_random_fd) and XOR logic
- Remove alloc_value_buffer(const char* copy_from) overload

This ensures random data is generated with the correct per-client seed
rather than using the default seed during initial buffer allocation (leading
to repeated values).
@jit-ci
Copy link

jit-ci bot commented Dec 5, 2025

Hi, I’m Jit, a friendly security platform designed to help developers build secure applications from day zero with an MVS (Minimal viable security) mindset.

In case there are security findings, they will be communicated to you as a comment inside the PR.

Hope you’ll enjoy using Jit.

Questions? Comments? Want to learn more? Get in touch with us.

Copy link
Collaborator

@fcostaoliveira fcostaoliveira left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you @paulorsousa !

@fcostaoliveira fcostaoliveira merged commit 8985eb5 into master Dec 5, 2025
39 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants