Skip to content

Add Kafka ingestion support for subset partitions#17587

Merged
xiangfu0 merged 1 commit intoapache:masterfrom
xiangfu0:kafka-subset-partitions
Mar 12, 2026
Merged

Add Kafka ingestion support for subset partitions#17587
xiangfu0 merged 1 commit intoapache:masterfrom
xiangfu0:kafka-subset-partitions

Conversation

@xiangfu0
Copy link
Copy Markdown
Contributor

@xiangfu0 xiangfu0 commented Jan 27, 2026

Summary

Add support for Kafka partition-subset realtime ingestion so Pinot can assign and consume only selected topic partitions for a table.

Changes

  • Add stream.kafka.partition.ids parser/validation utilities in pinot-kafka-base to interpret configured partition subsets.
  • Update controller assignment logic (segment and instance selectors) to support partition-group assignment across subset partitions.
  • Update Kafka metadata providers (pinot-kafka-3.0 and pinot-kafka-4.0) to:
    • honor configured partition subsets in partition counts/group metadata
    • validate configured IDs against topic metadata
    • support stable subset-based partition-group behavior
  • Add unit tests for subset parsing and Kafka metadata-provider partition selection.
  • Add quickstart examples for split-topic ingestion:
    • fineFoodReviews-part-0
    • fineFoodReviews-part-1

@xiangfu0 xiangfu0 requested a review from Copilot January 27, 2026 16:30
@xiangfu0 xiangfu0 added feature release-notes Referenced by PRs that need attention when compiling the next release notes kafka Related to Kafka stream connector ingestion Related to data ingestion pipeline labels Jan 27, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds support for configuring Kafka ingestion to consume only a subset of topic partitions via the stream.kafka.partition.ids configuration property. This enables multiple tables to share a single Kafka topic by consuming different partitions.

Changes:

  • Added stream.kafka.partition.ids configuration property and parsing utilities
  • Modified Kafka metadata providers (2.0 and 3.0) to validate and respect partition subsets
  • Updated instance assignment logic to support non-contiguous partition IDs
  • Added comprehensive unit tests and example configurations

Reviewed changes

Copilot reviewed 19 out of 19 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
pinot-kafka-base/KafkaStreamConfigProperties.java Defines new PARTITION_IDS constant for subset configuration
pinot-kafka-base/KafkaPartitionSubsetUtils.java Implements parsing, validation, and deduplication of partition ID lists
pinot-kafka-base/KafkaPartitionSubsetUtilsTest.java Comprehensive unit tests for partition ID parsing
pinot-kafka-base/KafkaPartitionLevelStreamConfig.java Exposes stream config map for partition subset utilities
pinot-kafka-2.0/KafkaStreamMetadataProvider.java Overrides partition methods to validate and return subset partitions
pinot-kafka-3.0/KafkaStreamMetadataProvider.java Mirrors 2.0 implementation for Kafka 3.0 compatibility
pinot-kafka-2.0/KafkaPartitionLevelConsumerTest.java Tests subset validation and partition count/ID fetching
pinot-kafka-2.0/README.md Documents the partition subset feature
InstanceReplicaGroupPartitionSelector.java Supports explicit partition IDs in instance assignment
ImplicitRealtimeTablePartitionSelector.java Fetches and uses stream partition IDs for instance assignment
RealtimeSegmentAssignment.java Updates segment assignment to handle non-contiguous partition IDs
InstanceAssignmentTest.java Tests single-partition subset with non-zero ID
QuickStartBase.java Adds fineFoodReviews-part-0 and fineFoodReviews-part-1 examples
examples/stream/subsetPartitions/* Example configuration and documentation
examples/stream/fineFoodReviews-part-/ Demo tables consuming single partitions

Comment thread pinot-plugins/pinot-stream-ingestion/pinot-kafka-2.0/README.md Outdated
@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented Jan 27, 2026

Codecov Report

❌ Patch coverage is 65.75342% with 25 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.27%. Comparing base (f641a9c) to head (0d38946).
⚠️ Report is 9 commits behind head on master.

Files with missing lines Patch % Lines
...in/stream/kafka30/KafkaStreamMetadataProvider.java 41.86% 24 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff            @@
##             master   #17587   +/-   ##
=========================================
  Coverage     63.26%   63.27%           
  Complexity     1466     1466           
=========================================
  Files          3190     3191    +1     
  Lines        192039   192102   +63     
  Branches      29421    29434   +13     
=========================================
+ Hits         121492   121545   +53     
- Misses        61026    61048   +22     
+ Partials       9521     9509   -12     
Flag Coverage Δ
custom-integration1 100.00% <ø> (ø)
integration 100.00% <ø> (ø)
integration1 100.00% <ø> (ø)
integration2 0.00% <ø> (ø)
java-11 63.24% <65.75%> (+0.02%) ⬆️
java-21 63.22% <65.75%> (-0.02%) ⬇️
temurin 63.27% <65.75%> (+<0.01%) ⬆️
unittests 63.26% <65.75%> (+<0.01%) ⬆️
unittests1 55.59% <ø> (+<0.01%) ⬆️
unittests2 34.27% <65.75%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@xiangfu0 xiangfu0 force-pushed the kafka-subset-partitions branch 3 times, most recently from ab92eca to 3760d06 Compare January 28, 2026 05:45
@xiangfu0 xiangfu0 requested a review from Copilot January 28, 2026 10:07
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 3 comments.

@xiangfu0 xiangfu0 force-pushed the kafka-subset-partitions branch from 3760d06 to 4235903 Compare January 28, 2026 13:01
@xiangfu0 xiangfu0 requested a review from Copilot January 28, 2026 13:01
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.

@xiangfu0 xiangfu0 force-pushed the kafka-subset-partitions branch 3 times, most recently from 0379c91 to 0374b0b Compare January 28, 2026 17:18
@xiangfu0 xiangfu0 requested a review from Copilot January 29, 2026 13:34
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 4 comments.

@xiangfu0 xiangfu0 force-pushed the kafka-subset-partitions branch 2 times, most recently from 9220944 to d045fa9 Compare January 29, 2026 16:09
@xiangfu0 xiangfu0 requested a review from Copilot January 29, 2026 16:25
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 18 out of 18 changed files in this pull request and generated 2 comments.

@xiangfu0 xiangfu0 force-pushed the kafka-subset-partitions branch 2 times, most recently from 03fed47 to 3855de1 Compare February 3, 2026 16:36
@xiangfu0 xiangfu0 requested a review from Jackie-Jiang February 3, 2026 16:39
@xiangfu0 xiangfu0 force-pushed the kafka-subset-partitions branch from 3855de1 to 548b943 Compare February 5, 2026 11:29
@xiangfu0 xiangfu0 force-pushed the kafka-subset-partitions branch from 49b4b3e to 5f1d26f Compare February 28, 2026 08:24
@xiangfu0 xiangfu0 force-pushed the kafka-subset-partitions branch 5 times, most recently from 702cbdf to ad40be3 Compare March 2, 2026 21:21
@xiangfu0 xiangfu0 force-pushed the kafka-subset-partitions branch 2 times, most recently from a5e8de3 to c7bace2 Compare March 3, 2026 06:41
@xiangfu0 xiangfu0 force-pushed the kafka-subset-partitions branch 3 times, most recently from ae62d10 to 2caae5b Compare March 8, 2026 07:06
@xiangfu0 xiangfu0 force-pushed the kafka-subset-partitions branch 2 times, most recently from 364fa3b to 6d002b9 Compare March 12, 2026 05:02
@xiangfu0 xiangfu0 force-pushed the kafka-subset-partitions branch from 6d002b9 to 34642c5 Compare March 12, 2026 08:04
Allow a realtime table to consume only a subset of Kafka partitions by
configuring `stream.kafka.partition.ids` (e.g. "0,2,5"). This enables
splitting a single Kafka topic across multiple Pinot tables for
independent scaling and isolation.

Key changes:
- Add KafkaPartitionSubsetUtils to parse and validate partition ID
  configuration from StreamConfig
- Update KafkaStreamMetadataProvider (kafka30/kafka40) to filter
  partitions and offsets based on the configured subset while still
  returning the total Kafka partition count for instance assignment
- Update PinotLLCRealtimeSegmentManager to use total partition count
  from instance partitions for segment ZK metadata, ensuring correct
  broker query routing across subset tables
- Add QuickStart example with two subset tables splitting a 2-partition
  topic (fineFoodReviews_part_0 and fineFoodReviews_part_1)
- Add unit tests, integration tests, and a chaos integration test
  validating partition assignment, segment creation, and query routing
@xiangfu0 xiangfu0 force-pushed the kafka-subset-partitions branch from 34642c5 to 0d38946 Compare March 12, 2026 08:11
@xiangfu0 xiangfu0 merged commit 097a89f into apache:master Mar 12, 2026
31 of 32 checks passed
@xiangfu0 xiangfu0 deleted the kafka-subset-partitions branch March 12, 2026 18:48
xiangfu0 added a commit to xiangfu0/pinot-docs that referenced this pull request Mar 20, 2026
Add documentation for the stream.kafka.partition.ids setting introduced
in apache/pinot#17587, which allows a REALTIME table to consume only a
subset of a Kafka topic's partitions. This covers the configuration
reference entry, use cases, example table configs for split-topic
ingestion, and validation rules.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@xiangfu0 xiangfu0 added the real-time Related to realtime table ingestion and serving label Mar 20, 2026
xiangfu0 added a commit to pinot-contrib/pinot-docs that referenced this pull request Mar 21, 2026
xiangfu0 added a commit to pinot-contrib/pinot-docs that referenced this pull request Mar 21, 2026
xiangfu0 added a commit to pinot-contrib/pinot-docs that referenced this pull request Mar 21, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ingestion Related to data ingestion pipeline kafka Related to Kafka stream connector real-time Related to realtime table ingestion and serving release-notes Referenced by PRs that need attention when compiling the next release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants