CASSANDRA-21154 remove traces of sampling and auto training #4599

smiklosovic · 2026-02-03T01:01:05Z

Thanks for sending a pull request! Here are some tips if you're new here:

Ensure you have added or run the appropriate tests for your PR.
Be sure to keep the PR description updated to reflect all changes.
Write your PR title to summarize what this PR proposes.
If possible, provide a concise example to reproduce the issue for a faster review.
Read our contributor guidelines
If you're making a documentation change, see our guide to documentation contribution

Commit messages should follow the following format:

<One sentence description, usually Jira title or CHANGES.txt summary>

<Optional lengthier description (context on patch)>

patch by <Authors>; reviewed by <Reviewers> for CASSANDRA-#####

Co-authored-by: Name1 <email1>
Co-authored-by: Name2 <email2>

The Cassandra Jira

smiklosovic · 2026-02-03T01:03:24Z

test/unit/org/apache/cassandra/db/compression/CompressionDictionarySchedulerTest.java

-
-        Set<SSTableReader> sstables = new HashSet<>();
-        CompressionDictionaryTrainingConfig config = createSampleAllTrainingConfig(cfs);
+        try (CompressionDictionaryManager manager = cfs.compressionDictionaryManager())


to shut up idea code style

smiklosovic · 2026-02-03T07:10:37Z

src/java/org/apache/cassandra/db/compression/ICompressionDictionaryTrainer.java

@@ -159,6 +144,6 @@ enum TrainingStatus
        SAMPLING,


I am not completely sure about this staying here. We are technically not sampling anymore. SAMPLING is the state after we do "start" and we depend on this state when we go to train. It would be probably better to replace it with "started".

It would be also maybe better if we called "start" method of ICompressionDictionaryTrainer like "init" instead of "start". We have not started anything and we are not sampling either. The javadoc of start method says

Starts the trainer for collecting samples.

We are not doing that anymore.

I think the "SAMPLING" status still make sense. The current and only approach to collect training data is to perform random sampling on the the on-disk sstables.

smiklosovic · 2026-02-03T07:19:34Z

src/java/org/apache/cassandra/db/compression/ZstdDictionaryTrainer.java

+        if (closed)
            return false;

        try


also

logger.info("Started dictionary training for {}.{}", keyspaceName, tableName);

below is suspicious. I would not log anything, actually.

Agreed on the log message removal. Given that, the training is manual, the log message to indicate the start of the process is less useful.

yifan-c

Looks good overall. Thank you for tidying up the currently unused code in the dictionary compression domain!

remove traces of sampling and auto training

5330cf4

smiklosovic commented Feb 3, 2026

View reviewed changes

smiklosovic requested a review from yifan-c February 3, 2026 01:07

smiklosovic commented Feb 3, 2026

View reviewed changes

yifan-c reviewed Feb 4, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CASSANDRA-21154 remove traces of sampling and auto training #4599

CASSANDRA-21154 remove traces of sampling and auto training #4599

Uh oh!

smiklosovic commented Feb 3, 2026

Uh oh!

smiklosovic Feb 3, 2026

Uh oh!

smiklosovic Feb 3, 2026

Uh oh!

smiklosovic Feb 3, 2026 •

edited

Loading

Uh oh!

yifan-c Feb 4, 2026

Uh oh!

smiklosovic Feb 3, 2026

Uh oh!

yifan-c Feb 4, 2026 •

edited

Loading

Uh oh!

yifan-c left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

CASSANDRA-21154 remove traces of sampling and auto training #4599

Are you sure you want to change the base?

CASSANDRA-21154 remove traces of sampling and auto training #4599

Uh oh!

Conversation

smiklosovic commented Feb 3, 2026

Uh oh!

smiklosovic Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

smiklosovic Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

smiklosovic Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yifan-c Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

smiklosovic Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

yifan-c Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yifan-c left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

smiklosovic Feb 3, 2026 •

edited

Loading

yifan-c Feb 4, 2026 •

edited

Loading