Skip to content

Conversation

@suvodeep-pyne
Copy link
Contributor

@suvodeep-pyne suvodeep-pyne commented Jan 27, 2026

Summary

  • Add TenantTagReadinessCallback to verify broker has valid tenant tags (not just broker_untagged)
  • Add RoutingReadinessCallback to verify routing exists for all assigned tables before reporting healthy
  • Register both callbacks in BaseBrokerStarter.registerServiceStatusHandler() with documented ordering dependency

Background

This addresses a production issue where queries could be routed to brokers that were not yet ready to serve traffic, causing query failures. The root cause is that the existing health check only validates Helix state convergence but doesn't verify:

  1. The broker has valid tenant tags assigned
  2. Routing tables are built for assigned tables

Test plan

  • Unit tests for TenantTagReadinessCallback (4 tests)
  • Unit tests for RoutingReadinessCallback (6 tests including IdealState API verification)
  • Integration test in HelixBrokerStarterTest

This change ensures that broker health checks take tenant tags and routing
readiness into account before reporting the broker as healthy.

Two new ServiceStatusCallback implementations are added:
- TenantTagReadinessCallback: Verifies the broker has valid tenant tags
  (not just broker_untagged)
- RoutingReadinessCallback: Verifies routing exists for all assigned tables

This addresses a production issue where queries could be routed to brokers
that were not yet ready to serve traffic, causing query failures.

The callbacks are registered before the IdealState/ExternalView match
callbacks, with TenantTagReadinessCallback preceding RoutingReadinessCallback
since an untagged broker has no tables assigned.
@suvodeep-pyne suvodeep-pyne changed the title Add health check callbacks for tenant tags and routing readiness [broker] Add health check callbacks for tenant tags and routing readiness Jan 27, 2026
@codecov-commenter
Copy link

codecov-commenter commented Jan 27, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 55.52%. Comparing base (234d382) to head (d191847).
⚠️ Report is 1 commits behind head on master.

❗ There is a different number of reports uploaded between BASE (234d382) and HEAD (d191847). Click for more details.

HEAD has 6 uploads less than BASE
Flag BASE (234d382) HEAD (d191847)
java-21 5 4
unittests 4 2
temurin 9 8
unittests2 2 0
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #17580      +/-   ##
============================================
- Coverage     63.21%   55.52%   -7.70%     
+ Complexity     1476      709     -767     
============================================
  Files          3172     2476     -696     
  Lines        189806   140012   -49794     
  Branches      29046    22318    -6728     
============================================
- Hits         119987    77735   -42252     
+ Misses        60508    55725    -4783     
+ Partials       9311     6552    -2759     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 55.47% <ø> (-7.68%) ⬇️
java-21 55.49% <ø> (-7.68%) ⬇️
temurin 55.52% <ø> (-7.70%) ⬇️
unittests 55.51% <ø> (-7.70%) ⬇️
unittests1 55.51% <ø> (-0.03%) ⬇️
unittests2 ?

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants