Skip to content

fix(sentinel): preserve seed nodes in sentinelRootNodes after topology update (#3237)#3306

Merged
nkaradzhov merged 11 commits into
redis:masterfrom
aartisonigra:fix/sentinel-seed-nodes-3237
Jun 24, 2026
Merged

fix(sentinel): preserve seed nodes in sentinelRootNodes after topology update (#3237)#3306
nkaradzhov merged 11 commits into
redis:masterfrom
aartisonigra:fix/sentinel-seed-nodes-3237

Conversation

@aartisonigra

@aartisonigra aartisonigra commented Jun 2, 2026

Copy link
Copy Markdown
Contributor

Problem

When sentinel reported its peer list (IP-based nodes), the client
replaced sentinelRootNodes entirely with those IPs, dropping the
original hostname-based seed nodes.

After a full outage where all sentinels restarted with new IPs, the
client had no working address to reconnect to — it kept retrying
stale IPs forever.

Fixes #3237

Root Cause

In transform():
this.#sentinelRootNodes = analyzed.sentinelList;
This dropped the original hostname seed nodes completely.

Fix

Added #mergeSentinelNodes() that:

  1. Always keeps seed nodes (hostnames) first
  2. Appends newly discovered nodes (IPs) without duplicates

So DNS-based hostnames are always available for reconnection
even after a full outage with new IPs.

Files Changed

  • packages/client/lib/sentinel/index.ts — core fix
  • packages/client/lib/sentinel/index.spec.ts — 2 new tests

Tests Added

  • seed nodes are preserved in sentinelRootNodes after sentinel list update
  • mergeSentinelNodes: seed hostnames always come first, IPs appended without duplicates

Note

Medium Risk
Changes Sentinel topology/reconnect candidate lists used during observe/connect loops; behavior is localized but affects production failover recovery paths.

Overview
Fixes Sentinel reconnection after topology updates by not replacing sentinelRootNodes with only the IP list returned by sentinels.

transform() now builds the working list with mergeSentinelNodes: configured sentinelRootNodes seeds stay first, discovered peers are appended, and entries are deduped by host:port (cloned so seeds are not aliased). That keeps hostname/DNS seeds available for reconnect when every sentinel comes back on new IPs after a full outage (#3237).

Docs add a “Reconnecting after an outage” section (hostname vs IP-literal seeds). Unit tests cover merge ordering, dedupe, and empty discovery.

Reviewed by Cursor Bugbot for commit be75a0f. Bugbot is set up for automated code reviews on this repo. Configure here.

Comment thread packages/client/lib/sentinel/index.ts
Comment thread packages/client/lib/sentinel/index.spec.ts Outdated
Comment thread packages/client/lib/sentinel/index.spec.ts
Comment thread packages/client/lib/sentinel/index.ts
@aartisonigra aartisonigra force-pushed the fix/sentinel-seed-nodes-3237 branch from 789da3a to c46c390 Compare June 2, 2026 11:57
@aartisonigra

Copy link
Copy Markdown
Contributor Author

Hi, I would like to work on this issue. Could you please assign it to me?

@nkaradzhov

Copy link
Copy Markdown
Collaborator

@aartisonigra thanks, i will try to have a look today!

@nkaradzhov

Copy link
Copy Markdown
Collaborator

IP-literal seeds get stuck:
The merge unconditionally prepends #sentinelSeedNodes. Right for hostnames — DNS re-resolution is the whole point. But if someone configures IP seeds and the cluster gets reprovisioned with new IPs, the old seed IPs stay in #sentinelRootNodes forever. They never recover, and observe() hits them first on every reconnect — dead-IP timeouts before reaching the live ones.

Could we filter at construction so only hostnames count as seeds?

this.#sentinelSeedNodes = Array.from(options.sentinelRootNodes)
  .filter(n => net.isIP(n.host) === 0);

Plus a test for IP-only seeds with disjoint discovery, to lock the behavior in.

Tests don't actually hit the new code

The first test reflects into _self by symbol-name fishing and has a fallback that asserts seedNodes.length === 2 — that passes no matter what. The second one reimplements the merge logic inline and checks the duplicate, never calls #mergeSentinelNodes. Can we drive this through transform() end-to-end? The full-outage integration test from #3188 is a good starting point — just have the sentinels come back with different IPs and assert recovery via the hostname seed.

Unrelated type change

#createClient went from AnyRedisClientOptions to the explicit generic. Not in the description and unrelated to the seed fix — split it or mention why.

SENTINE_LIST_CHANGE.size meaning changed

It's now the merged length, not the discovered count. Behavior change for any consumer of the event — worth a note in the description at least.

@aartisonigra

Copy link
Copy Markdown
Contributor Author

Thanks for the feedback.

I've updated the regression test to validate the behavior through the real transform() flow and removed the previous merge-logic style assertions. The test now verifies the emitted SENTINE_LIST_CHANGE.size from the implementation itself.

Please let me know if you'd like any additional coverage or changes.

Comment thread packages/client/lib/sentinel/index.ts Outdated
Comment thread packages/client/lib/sentinel/index.ts Outdated
@aartisonigra

Copy link
Copy Markdown
Contributor Author

I’ve addressed the review comments and updated the PR with the latest fixes.

Please let me know if any further changes are needed. Looking forward to your review.

Thanks!

aartisonigra and others added 2 commits June 5, 2026 19:56
Address review on redis#3306:

- Remove duplicate import block that broke compilation, and drop the
  unused `isIP` import.
- Revert the unrelated `#createClient` parameter type narrowing
  (`AnyRedisClientOptions` -> concrete `RedisClientOptions`).
- Extract the merge logic into a pure, exported `mergeSentinelNodes`
  helper in utils.ts instead of a private method, and fix the broken
  indentation in the constructor and transform().
- Replace the regression tests that reflected on `#private` fields
  (always undefined at runtime, so every assertion was skipped) with
  real unit tests against `mergeSentinelNodes`.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

Reviewed by Cursor Bugbot for commit 7223bef. Configure here.

Comment thread packages/client/lib/sentinel/utils.ts
nkaradzhov and others added 2 commits June 24, 2026 14:30
- mergeSentinelNodes now pushes fresh node objects instead of aliasing
  the frozen seed instances into the working root-nodes list.
- Document seed/reconnection behaviour in docs/sentinel.md: hostname
  seeds re-resolve and recover from full-IP-change outages; IP-literal
  seeds recover only when sentinels return at the same address.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Satisfies the no-var lint rule now that utils.ts is a changed file.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@nkaradzhov nkaradzhov merged commit ae84140 into redis:master Jun 24, 2026
14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Sentinel client not able to recover after all sentinels went down

2 participants