feat: add is_deleted soft-delete to ClickHouse IDENTITIES table#7710
feat: add is_deleted soft-delete to ClickHouse IDENTITIES table#771010done wants to merge 10 commits into
Conversation
|
@10done is attempting to deploy a commit to the Flagsmith Team on Vercel. A member of the Team first needs to authorize it. |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #7710 +/- ##
==========================================
- Coverage 98.52% 98.39% -0.13%
==========================================
Files 1444 1453 +9
Lines 54971 55870 +899
==========================================
+ Hits 54161 54976 +815
- Misses 810 894 +84 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a mechanism to handle deleted identities in ClickHouse by writing a tombstone row (with is_deleted=True) upon identity deletion, ensuring they are excluded from segment membership counts. It includes a ClickHouse migration, updates to the mapping logic, a new Celery task, and corresponding unit tests. The review feedback highlights a potential issue with write amplification and table bloat in multi-tenant environments, suggesting that we should verify if segment membership is enabled for the organization before writing the tombstone row.
| if not settings.CLICKHOUSE_ENABLED: | ||
| logger.info( | ||
| "tombstone.skipped", | ||
| reason="clickhouse_not_configured", | ||
| env_key=env_key, | ||
| identifier=identifier, | ||
| ) | ||
| return |
There was a problem hiding this comment.
In a multi-tenant environment, writing tombstone rows for every deleted identity across all organizations—even those that do not have segment_membership_inspection enabled—will lead to unnecessary ClickHouse write amplification and table bloat. We should check if the organization has segment membership enabled before writing the tombstone.
if not settings.CLICKHOUSE_ENABLED:
logger.info(
"tombstone.skipped",
reason="clickhouse_not_configured",
env_key=env_key,
identifier=identifier,
)
return
from environments.models import Environment
try:
environment = Environment.objects.select_related(
"project__organisation"
).get(api_key=env_key)
except Environment.DoesNotExist:
logger.info(
"tombstone.skipped",
reason="environment_not_found",
env_key=env_key,
identifier=identifier,
)
return
if not is_membership_enabled(environment.project.organisation):
logger.info(
"tombstone.skipped",
reason="segment_membership_disabled",
env_key=env_key,
identifier=identifier,
)
returnThere was a problem hiding this comment.
I have addressed this raised concern.
Thanks for submitting a PR! Please check the boxes below:
docs/if required so people know about the feature.Changes
Fixes #7593
is_deleted Bool DEFAULT falsecolumn to the ClickHouseIDENTITIEStable via a new migrationis_deletedon every backfill row (defaults tofalse)write_identity_deletion_tombstone_to_clickhouseasync task, dispatched fromEdgeIdentity.delete()to tombstone deleted identities in real timeAND i.is_deleted = falseto the segment membership count query so deleted identities never appear in countsHow did you test this code?