Skip to content

[SPARK-55450][SS][PYTHON][DOCS] Document admission control and Trigger.AvailableNow in PySpark streaming data sources#54807

Open
jiteshsoni wants to merge 3 commits intoapache:masterfrom
jiteshsoni:SPARK-55450-admission-control-docs
Open

[SPARK-55450][SS][PYTHON][DOCS] Document admission control and Trigger.AvailableNow in PySpark streaming data sources#54807
jiteshsoni wants to merge 3 commits intoapache:masterfrom
jiteshsoni:SPARK-55450-admission-control-docs

Conversation

@jiteshsoni
Copy link
Contributor

What changes were proposed in this pull request?

This PR adds documentation and examples for admission control and Trigger.AvailableNow support in PySpark custom streaming data sources (SPARK-55304).

Changes include:

  1. Documentation updates to python/docs/source/tutorial/sql/python_data_source.rst:

    • New section "Admission Control and Trigger.AvailableNow"
    • Explanation of ReadLimit classes (ReadAllAvailable, ReadMaxRows, ReadMinRows, ReadMaxFiles, ReadMaxBytes)
    • API documentation for DataSourceStreamReader methods (latestOffset with start/limit, getDefaultReadLimit, reportLatestOffset)
    • SupportsTriggerAvailableNow mixin documentation
  2. Example files:

    • python_datasource_admission_control.py: Basic admission control with ReadMaxRows
    • structured_blockchain_admission_control.py: Advanced example with SupportsTriggerAvailableNow

Why are the changes needed?

Users need documentation and practical examples to implement admission control in custom streaming sources (introduced in SPARK-55304).

Does this PR introduce any user-facing change?

No. Documentation and examples only.

How was this patch tested?

  • Examples pass Python syntax validation and flake8 linting
  • Python imports verified to work correctly
  • Documentation follows existing RST structure in python_data_source.rst
  • Code patterns match existing test patterns in test_python_streaming_datasource.py

Was this patch authored or co-authored using generative AI tooling?

Yes (Claude Code)

🤖 Generated with Claude Code

jiteshsoni and others added 3 commits March 14, 2026 10:30
…r.AvailableNow in PySpark streaming data sources

### What changes were proposed in this pull request?

This PR adds documentation and examples for admission control and Trigger.AvailableNow support in PySpark custom streaming data sources (SPARK-55304).

Changes include:

1. **Documentation updates** to `python/docs/source/tutorial/sql/python_data_source.rst`:
   - New section "Admission Control and Trigger.AvailableNow"
   - Explanation of ReadLimit classes (ReadAllAvailable, ReadMaxRows, ReadMinRows, ReadMaxFiles, ReadMaxBytes)
   - API documentation for DataSourceStreamReader methods (latestOffset with start/limit, getDefaultReadLimit, reportLatestOffset)
   - SupportsTriggerAvailableNow mixin documentation

2. **Example files**:
   - `python_datasource_admission_control.py`: Basic admission control with ReadMaxRows
   - `structured_blockchain_admission_control.py`: Advanced example with SupportsTriggerAvailableNow

### Why are the changes needed?

Users need documentation and practical examples to implement admission control in custom streaming sources (introduced in SPARK-55304).

### Does this PR introduce _any_ user-facing change?

No. Documentation and examples only.

### How was this patch tested?

- Examples pass Python syntax validation and flake8 linting
- Documentation follows existing RST structure in python_data_source.rst
- Code patterns match existing test patterns in test_python_streaming_datasource.py

### Was this patch authored or co-authored using generative AI tooling?

Yes (Claude Code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- Add return type annotations to __init__, main(), process_batch()
- Add type annotations for function parameters
- Add type hints for local variables (batch_stats, blocks_processed)
- Import additional types (DataFrame, StructType, Dict, List, Any)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
…ontrol

This commit addresses reviewer feedback by adding end-to-end demonstrations
of how admission control interacts with Trigger.AvailableNow.

Changes:
- python_datasource_admission_control.py: Now runs TWO queries showing
  default trigger (ReadMaxRows) vs Trigger.AvailableNow (ReadAllAvailable)
- structured_blockchain_admission_control.py: Demonstrates
  SupportsTriggerAvailableNow mixin with visible prepareForTriggerAvailableNow
  lifecycle and target offset enforcement
- python_data_source.rst: Updated documentation to explain both trigger
  modes and expected behavior

Previous version implemented the APIs but didn't show them in action.
New version demonstrates the complete interaction with actual streaming
queries using both trigger types.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant