CASSANDRA-21129: Offline TCM dump tool by dracarys09 · Pull Request #4581 · apache/cassandra

dracarys09 · 2026-01-23T05:15:51Z

When a Cassandra node fails to start due to Transactional Cluster Metadata (TCM/CEP-21) corruption or issues, operators need a way to inspect the cluster metadata state offline without starting the node. The existing tools (nodetool, cqlsh) require a running node, leaving operators blind when debugging startup failures.

With CEP-21 (Transactional Cluster Metadata), cluster metadata is stored in system tables:

system.local_metadata_log - Contains transformation entries (epoch -> transformation)
system.metadata_snapshots - Contains periodic snapshots of ClusterMetadata

When a node fails to start due to TCM corruption or inconsistencies, operators have no way to inspect the metadata state without a running node. This tool fills that gap by reading directly from SSTables.

Thanks for sending a pull request! Here are some tips if you're new here:

Ensure you have added or run the appropriate tests for your PR.
Be sure to keep the PR description updated to reflect all changes.
Write your PR title to summarize what this PR proposes.
If possible, provide a concise example to reproduce the issue for a faster review.
Read our contributor guidelines
If you're making a documentation change, see our guide to documentation contribution

Commit messages should follow the following format:

<One sentence description, usually Jira title or CHANGES.txt summary>

<Optional lengthier description (context on patch)>

patch by <Authors>; reviewed by <Reviewers> for CASSANDRA-#####

Co-authored-by: Name1 <email1>
Co-authored-by: Name2 <email2>

The Cassandra Jira

When a Cassandra node fails to start due to Transactional Cluster Metadata (TCM/CEP-21) corruption or issues, operators need a way to inspect the cluster metadata state offline without starting the node. The existing tools (nodetool, cqlsh) require a running node, leaving operators blind when debugging startup failures. With CEP-21 (Transactional Cluster Metadata), cluster metadata is stored in system tables: * system.local_metadata_log - Contains transformation entries (epoch -> transformation) * system.metadata_snapshots - Contains periodic snapshots of ClusterMetadata When a node fails to start due to TCM corruption or inconsistencies, operators have no way to inspect the metadata state without a running node. This tool fills that gap by reading directly from SSTables.

krummas

So this is an emergency recovery tool, hopefully extremely rarely used by an operator, I think we can slim it down a lot, these are the features I think we need here:

dump metadata to current (or user provided) epoch
- serialized binary format
- metadata.toString, to avoid locking us in to any formats
dump log (with start/end epoch), just toString each entry
maybe add option to dump system_clustermetadata.distributed_metadata_log if this is run on a CMS node

issues;

shell script should live in tools/bin/ directory
tool name - this does not dump sstable metadata, it dumps cluster metadata from sstables, sstable metadata is something different (see tools/bin/sstablemetadata)
it copies the sstables to $CASSANDRA_HOME/data (or, if that is unset, in to the current directory) - we should create a temporary directory for import and clean that directory up after dumping the metadata, we need something like

                Path p = Files.createTempDirectory("dumptcmlog");
                DatabaseDescriptor.getRawConfig().data_file_directories = new String[] {p.resolve("data").toString()};
                DatabaseDescriptor.getRawConfig().commitlog_directory = p.resolve("commitlog").toString();
                DatabaseDescriptor.getRawConfig().accord.journal_directory = p.resolve("accord_journal").toString();
                DatabaseDescriptor.getRawConfig().hints_directory = p.resolve("hints").toString();
                DatabaseDescriptor.getRawConfig().saved_caches_directory = p.resolve("saved_caches").toString();

to make sure we only touch the tmp directory

krummas requested changes Jan 26, 2026

View reviewed changes

Abhijeet Dubey added 3 commits January 28, 2026 11:55

Address review comments take 1

bfa32fb

Address review comments take 2

de5c70d

Add integration test to ensure gaps are reported correctly

d5c4db9

dracarys09 requested a review from krummas January 29, 2026 07:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CASSANDRA-21129: Offline TCM dump tool#4581

CASSANDRA-21129: Offline TCM dump tool#4581
dracarys09 wants to merge 4 commits intoapache:trunkfrom
dracarys09:tcm-dump-tool

dracarys09 commented Jan 23, 2026

Uh oh!

krummas left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dracarys09 commented Jan 23, 2026

Uh oh!

krummas left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants