-
Notifications
You must be signed in to change notification settings - Fork 3.8k
CASSANDRA-21154 remove traces of sampling and auto training #4599
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: trunk
Are you sure you want to change the base?
Conversation
|
|
||
| Set<SSTableReader> sstables = new HashSet<>(); | ||
| CompressionDictionaryTrainingConfig config = createSampleAllTrainingConfig(cfs); | ||
| try (CompressionDictionaryManager manager = cfs.compressionDictionaryManager()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to shut up idea code style
| @@ -159,6 +144,6 @@ enum TrainingStatus | |||
| SAMPLING, | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not completely sure about this staying here. We are technically not sampling anymore. SAMPLING is the state after we do "start" and we depend on this state when we go to train. It would be probably better to replace it with "started".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be also maybe better if we called "start" method of ICompressionDictionaryTrainer like "init" instead of "start". We have not started anything and we are not sampling either. The javadoc of start method says
Starts the trainer for collecting samples.
We are not doing that anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the "SAMPLING" status still make sense. The current and only approach to collect training data is to perform random sampling on the the on-disk sstables.
| if (closed) | ||
| return false; | ||
|
|
||
| try |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also
logger.info("Started dictionary training for {}.{}", keyspaceName, tableName);
below is suspicious. I would not log anything, actually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed on the log message removal. Given that, the training is manual, the log message to indicate the start of the process is less useful.
yifan-c
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall. Thank you for tidying up the currently unused code in the dictionary compression domain!
Thanks for sending a pull request! Here are some tips if you're new here:
Commit messages should follow the following format:
The Cassandra Jira