Skip to content

[feature](be) Add adaptive batch size for scan path (#62835)#63005

Open
mrhhsg wants to merge 1 commit intoapache:branch-4.1from
mrhhsg:pick_abs
Open

[feature](be) Add adaptive batch size for scan path (#62835)#63005
mrhhsg wants to merge 1 commit intoapache:branch-4.1from
mrhhsg:pick_abs

Conversation

@mrhhsg
Copy link
Copy Markdown
Member

@mrhhsg mrhhsg commented May 6, 2026

Pick PR: #62835

Problem Summary: Add adaptive block row prediction for SegmentIterator, OLAP scan, file scan, and format readers. The scan path now uses a row ceiling plus preferred output byte budget to reduce oversized blocks for wide rows while preserving row-limited behavior for narrow rows. This commit also introduces the shared session/config/thrift/runtime budget plumbing used by later operators.

Adds adaptive batch size controls for scan output blocks: preferred_block_size_bytes and preferred_max_column_in_block_size_bytes.

  • Test: Unit Test
  • Unit Test: ./run-be-ut.sh --run --filter=BlockBudgetTest.:RuntimeStateBatchSizeTest.:RuntimeStateBlockSizeBytesTest.:RuntimeStateMaxColBytesTest.:MockRuntimeStateBlockBudgetTest.:AdaptiveBlockSizePredictorTest.:BlockReaderBatchMaxRowsTest.:EstimateCollectedEnoughTest.:CollectedEnoughWithColumnsTest.:BlockReaderByteBudgetTest.:SegmentColumnRawDataBytesTest.:CsvReaderSetBatchSizeTest.:NewJsonReaderSetBatchSizeTest.:OrcReaderTest.:TableFormatReaderTest.:ProfileSpecTest.:LocalExchangerTest.*
  • Behavior changed: Yes (scan output block sizing can now be byte-budget limited when adaptive batch size is enabled)
  • Does this need documentation: Yes

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

None

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
  • This is a refactor/code format and no logic has been changed.
    - [ ] Previous test can cover this change. - [ ] No code files have been changed. - [ ] Other reason

  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
  • Yes.

  • Confirm the release note

  • Confirm test cases

  • Confirm document

  • Add branch pick label


What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@mrhhsg mrhhsg requested a review from yiguolei as a code owner May 6, 2026 06:43
@hello-stephen
Copy link
Copy Markdown
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@mrhhsg
Copy link
Copy Markdown
Member Author

mrhhsg commented May 6, 2026

run buildall

Issue Number: None

Related PR: None

Problem Summary: Add adaptive block row prediction for SegmentIterator,
OLAP scan, file scan, and format readers. The scan path now uses a row
ceiling plus preferred output byte budget to reduce oversized blocks for
wide rows while preserving row-limited behavior for narrow rows. This
commit also introduces the shared session/config/thrift/runtime budget
plumbing used by later operators.

Adds adaptive batch size controls for scan output blocks:
preferred_block_size_bytes and preferred_max_column_in_block_size_bytes.

- Test: Unit Test
- Unit Test: ./run-be-ut.sh --run
--filter=BlockBudgetTest.*:RuntimeStateBatchSizeTest.*:RuntimeStateBlockSizeBytesTest.*:RuntimeStateMaxColBytesTest.*:MockRuntimeStateBlockBudgetTest.*:AdaptiveBlockSizePredictorTest.*:BlockReaderBatchMaxRowsTest.*:EstimateCollectedEnoughTest.*:CollectedEnoughWithColumnsTest.*:BlockReaderByteBudgetTest.*:SegmentColumnRawDataBytesTest.*:CsvReaderSetBatchSizeTest.*:NewJsonReaderSetBatchSizeTest.*:OrcReaderTest.*:TableFormatReaderTest.*:ProfileSpecTest.*:LocalExchangerTest.*
- Behavior changed: Yes (scan output block sizing can now be byte-budget
limited when adaptive batch size is enabled)
- Does this need documentation: Yes

Issue Number: close #xxx

Related PR: #xxx

Problem Summary:

None

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [ ] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [ ] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
apache/doris-website#1214 -->

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->

---------

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants