Skip to content

feat: support mandatory compaction trigger for cascading reindexing supervisors who have unapplied deletion rules#19633

Draft
capistrant wants to merge 1 commit into
apache:masterfrom
capistrant:deletion-rule-special-case-compaction-eligibility
Draft

feat: support mandatory compaction trigger for cascading reindexing supervisors who have unapplied deletion rules#19633
capistrant wants to merge 1 commit into
apache:masterfrom
capistrant:deletion-rule-special-case-compaction-eligibility

Conversation

@capistrant

Copy link
Copy Markdown
Contributor

Description

Cascading Reindexing introduced deletion rules that allow for deleting rows that match a filter after some period of time. This has many applications, but one in specific is the ability to do row granular deletions automatically for data compliance needs. For example, if your Druid deploy contains customer data with a column customer_id identifying what customer a row of data is associated with, you may have different compliance requirements for different customers. For instance, keep customer foo data for 30 days and customer bar data for 365 days. With cascading reindexing deletion rules you have a rule for each of these requirements that will properly lifecycle your data and keep you in compliance.

Currently there is pain when it comes to compaction candidacy. An operator can put minimum un-compacted thresholds on intervals to avoid over-compacting when the gain from such compactions is negligible. From a pure cost perspective, this is great. Don't spend compute dollars for negligible space savings or performance gains. However, this breaks down from a compliance perspective when considering my leading example for customer data retention. The only option an operator has today if they need to meet compliance is to drop those thresholds down to essentially 0 and run compaction overly-aggressive even if the compaction does not apply any missing deletion rules. So they are eating cost to cover a limited set of cases.

This PR introduces a new policy config for compaction that an operator can opt into to force compaction to run on an interval if any of the candidate segments are out of sync with applying deletion rules. This allows keeping the thresholds for compaction at the desired level knowing that that decision will not put you into compliance issues when it comes to data lifecycle.

Compliance is just one reason someone may want to enable this config. It is the most compelling that comes to mind because it can be a legal requirement for operators to meet, and the current option to lower thresholds to be compliant is sub-optimal.

forcePendingDeletionCompaction is the new policy config that defaults to false, but anyone wishing to force pending deletion rules to be applied regardless of interval compaction stats, can simply set it to true and leave their thresholds for non-deletion required candidates in place.

Release note

Operators using Cascading Reindexing compaction supervisors with deletion rules can now use compaction policy configuration to opt into their deletion rules being applied to candidate intervals who are not fully compliant regardless of their interval level compaction stats. If you opt in, intervals that required deletions to be applied but did not meet compaction stat thresholds specified in the policy will now be compacted in order to ensure all required deletion rules are applied. Critical for operators who use deletion rules for data compliance needs. Note that you must be using MostFragmentedIntervalFirst policy to have this config take effect.


Key changed/added classes in this PR
  • CompactionConfigBasedJobTemplate
  • ReindexingDeletionRuleOptimizer
  • CompactionCandidateSearchPolicy
  • MostFragmentedIntervalFirstPolicy

This PR has:

  • been self-reviewed.
  • added documentation for new or modified features or behaviors.
  • a release note entry in the PR description.
  • added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links.
  • added or updated version, license, or notice information in licenses.yaml
  • added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader.
  • added unit tests or modified existing tests to cover new code paths, ensuring the threshold for code coverage is met.
  • added integration tests.
  • been tested in a test Druid cluster.

…upervisors who have unapplied deletion rules
@capistrant capistrant marked this pull request as draft June 26, 2026 18:30
@capistrant capistrant added Needs web console change Backend API changes that would benefit from frontend support in the web console Release Notes labels Jun 26, 2026
@capistrant

Copy link
Copy Markdown
Contributor Author

To limit PR scope web console change to the policy form will be in a follow up.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Area - Documentation Area - Ingestion Needs web console change Backend API changes that would benefit from frontend support in the web console Release Notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant