Skip to content

Conversation

@smiklosovic
Copy link
Contributor

Thanks for sending a pull request! Here are some tips if you're new here:

  • Ensure you have added or run the appropriate tests for your PR.
  • Be sure to keep the PR description updated to reflect all changes.
  • Write your PR title to summarize what this PR proposes.
  • If possible, provide a concise example to reproduce the issue for a faster review.
  • Read our contributor guidelines
  • If you're making a documentation change, see our guide to documentation contribution

Commit messages should follow the following format:

<One sentence description, usually Jira title or CHANGES.txt summary>

<Optional lengthier description (context on patch)>

patch by <Authors>; reviewed by <Reviewers> for CASSANDRA-#####

Co-authored-by: Name1 <email1>
Co-authored-by: Name2 <email2>

The Cassandra Jira


Set<SSTableReader> sstables = new HashSet<>();
CompressionDictionaryTrainingConfig config = createSampleAllTrainingConfig(cfs);
try (CompressionDictionaryManager manager = cfs.compressionDictionaryManager())
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

to shut up idea code style

@smiklosovic smiklosovic requested a review from yifan-c February 3, 2026 01:07
@@ -159,6 +144,6 @@ enum TrainingStatus
SAMPLING,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not completely sure about this staying here. We are technically not sampling anymore. SAMPLING is the state after we do "start" and we depend on this state when we go to train. It would be probably better to replace it with "started".

Copy link
Contributor Author

@smiklosovic smiklosovic Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be also maybe better if we called "start" method of ICompressionDictionaryTrainer like "init" instead of "start". We have not started anything and we are not sampling either. The javadoc of start method says

Starts the trainer for collecting samples.

We are not doing that anymore.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the "SAMPLING" status still make sense. The current and only approach to collect training data is to perform random sampling on the the on-disk sstables.

if (closed)
return false;

try
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also

logger.info("Started dictionary training for {}.{}", keyspaceName, tableName);

below is suspicious. I would not log anything, actually.

Copy link
Contributor

@yifan-c yifan-c Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on the log message removal. Given that, the training is manual, the log message to indicate the start of the process is less useful.

Copy link
Contributor

@yifan-c yifan-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good overall. Thank you for tidying up the currently unused code in the dictionary compression domain!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants