Add Kafka ingestion support for subset partitions#17587
Add Kafka ingestion support for subset partitions#17587xiangfu0 merged 1 commit intoapache:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds support for configuring Kafka ingestion to consume only a subset of topic partitions via the stream.kafka.partition.ids configuration property. This enables multiple tables to share a single Kafka topic by consuming different partitions.
Changes:
- Added
stream.kafka.partition.idsconfiguration property and parsing utilities - Modified Kafka metadata providers (2.0 and 3.0) to validate and respect partition subsets
- Updated instance assignment logic to support non-contiguous partition IDs
- Added comprehensive unit tests and example configurations
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| pinot-kafka-base/KafkaStreamConfigProperties.java | Defines new PARTITION_IDS constant for subset configuration |
| pinot-kafka-base/KafkaPartitionSubsetUtils.java | Implements parsing, validation, and deduplication of partition ID lists |
| pinot-kafka-base/KafkaPartitionSubsetUtilsTest.java | Comprehensive unit tests for partition ID parsing |
| pinot-kafka-base/KafkaPartitionLevelStreamConfig.java | Exposes stream config map for partition subset utilities |
| pinot-kafka-2.0/KafkaStreamMetadataProvider.java | Overrides partition methods to validate and return subset partitions |
| pinot-kafka-3.0/KafkaStreamMetadataProvider.java | Mirrors 2.0 implementation for Kafka 3.0 compatibility |
| pinot-kafka-2.0/KafkaPartitionLevelConsumerTest.java | Tests subset validation and partition count/ID fetching |
| pinot-kafka-2.0/README.md | Documents the partition subset feature |
| InstanceReplicaGroupPartitionSelector.java | Supports explicit partition IDs in instance assignment |
| ImplicitRealtimeTablePartitionSelector.java | Fetches and uses stream partition IDs for instance assignment |
| RealtimeSegmentAssignment.java | Updates segment assignment to handle non-contiguous partition IDs |
| InstanceAssignmentTest.java | Tests single-partition subset with non-zero ID |
| QuickStartBase.java | Adds fineFoodReviews-part-0 and fineFoodReviews-part-1 examples |
| examples/stream/subsetPartitions/* | Example configuration and documentation |
| examples/stream/fineFoodReviews-part-/ | Demo tables consuming single partitions |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #17587 +/- ##
=========================================
Coverage 63.26% 63.27%
Complexity 1466 1466
=========================================
Files 3190 3191 +1
Lines 192039 192102 +63
Branches 29421 29434 +13
=========================================
+ Hits 121492 121545 +53
- Misses 61026 61048 +22
+ Partials 9521 9509 -12
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
ab92eca to
3760d06
Compare
3760d06 to
4235903
Compare
0379c91 to
0374b0b
Compare
9220944 to
d045fa9
Compare
03fed47 to
3855de1
Compare
3855de1 to
548b943
Compare
49b4b3e to
5f1d26f
Compare
702cbdf to
ad40be3
Compare
a5e8de3 to
c7bace2
Compare
ae62d10 to
2caae5b
Compare
364fa3b to
6d002b9
Compare
6d002b9 to
34642c5
Compare
Allow a realtime table to consume only a subset of Kafka partitions by configuring `stream.kafka.partition.ids` (e.g. "0,2,5"). This enables splitting a single Kafka topic across multiple Pinot tables for independent scaling and isolation. Key changes: - Add KafkaPartitionSubsetUtils to parse and validate partition ID configuration from StreamConfig - Update KafkaStreamMetadataProvider (kafka30/kafka40) to filter partitions and offsets based on the configured subset while still returning the total Kafka partition count for instance assignment - Update PinotLLCRealtimeSegmentManager to use total partition count from instance partitions for segment ZK metadata, ensuring correct broker query routing across subset tables - Add QuickStart example with two subset tables splitting a 2-partition topic (fineFoodReviews_part_0 and fineFoodReviews_part_1) - Add unit tests, integration tests, and a chaos integration test validating partition assignment, segment creation, and query routing
34642c5 to
0d38946
Compare
Add documentation for the stream.kafka.partition.ids setting introduced in apache/pinot#17587, which allows a REALTIME table to consume only a subset of a Kafka topic's partitions. This covers the configuration reference entry, use cases, example table configs for split-topic ingestion, and validation rules. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
Add support for Kafka partition-subset realtime ingestion so Pinot can assign and consume only selected topic partitions for a table.
Changes
stream.kafka.partition.idsparser/validation utilities inpinot-kafka-baseto interpret configured partition subsets.pinot-kafka-3.0andpinot-kafka-4.0) to:fineFoodReviews-part-0fineFoodReviews-part-1