Skip to content

[CELEBORN-2314] Optimize the performance of DataBatches.requireBatches#3671

Open
afterincomparableyum wants to merge 1 commit intoapache:mainfrom
afterincomparableyum:celeborn-2314
Open

[CELEBORN-2314] Optimize the performance of DataBatches.requireBatches#3671
afterincomparableyum wants to merge 1 commit intoapache:mainfrom
afterincomparableyum:celeborn-2314

Conversation

@afterincomparableyum
Copy link
Copy Markdown
Contributor

@afterincomparableyum afterincomparableyum commented Apr 23, 2026

What changes were proposed in this pull request?

requireBatches(int requestSize) currently calls batches.remove(0) per iteration, which shifts all remaining elements each time, overall O(kn). Replacing with a two pass approach (find split point, then subList(0, count).clear()) reduces this to O(n).

Why are the changes needed?

This is a minor performance optimization.

Does this PR resolve a correctness bug?

No.

Does this PR introduce any user-facing change?

No, this is just a performance improvement.

How was this patch tested?

CI Unit/Integration tests.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Optimizes DataBatches.requireBatches(int) to avoid repeated ArrayList.remove(0) calls (which cause repeated element shifting) by selecting a prefix via subList(0, count) and clearing it in one operation.

Changes:

  • Replace per-element remove(0) loop with a prefix-size scan (get(count)) to find the split point.
  • Return the selected prefix batches and remove them from the internal list via subList(0, count).clear().
  • Update totalSize in one subtraction using the accumulated byte size.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread common/src/main/java/org/apache/celeborn/common/write/DataBatches.java Outdated
requireBatches(int requestSize) currently calls batches.remove(0) per iteration, which shifts all remaining elements each time, overall O(kn). Replacing with a two pass approach (find split point, then subList(0, count).clear()) reduces this to O(n).
@afterincomparableyum
Copy link
Copy Markdown
Contributor Author

Ping @SteNicholas @FMX for review

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 1 out of 1 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@SteNicholas
Copy link
Copy Markdown
Member

SteNicholas commented Apr 30, 2026

@afterincomparableyum

PR Review: [CELEBORN-2314] Optimize the performance of DataBatches.requireBatches                                                                                                                                                                 
                                                                                                                                                                                                                                                    
  Author: afterincomparableyum | File changed: 1 | +11 / -8                                                                                                                                                                                         
                                                                                                                                                                                                                                                    
  Overview                                                           
                                                                                                                                                                                                                                                    
  This PR optimizes DataBatches.requireBatches() by replacing an O(n²) ArrayList.remove(0) loop with an O(n) subList(0, count).clear() approach. The old code repeatedly removed the head element, causing all remaining elements to shift left on
  each removal. The new code counts elements first, copies the sublist, then clears it in a single bulk operation.

  Analysis

  What's good:

  - The core optimization is correct and well-motivated. ArrayList.remove(0) in a loop is a classic O(n²) anti-pattern; subList().clear() performs a single System.arraycopy internally — a solid improvement.
  - The requestSize >= totalSize fast path now correctly resets batches to a new empty list instead of returning the mutable internal reference. This is a bugfix on top of the optimization — previously, the caller and DataBatches shared the
  same ArrayList instance after the fast path, so mutations from either side would corrupt the other.                                                                                                                                               
  - The TODO comment from the old code is resolved and removed.
                                                                                                                                                                                                                                                    
  Issues and suggestions:   
                                                                     
  1. totalSize accounting is moved outside the loop (medium risk)
  The old code decremented totalSize per element: totalSize -= elem.body.length. The new code does totalSize -= currentSize once after the loop. This is logically equivalent only if currentSize doesn't overshoot requestSize significantly. Since
   the while condition is currentSize < requestSize, the loop will consume one extra batch whose body pushes currentSize past requestSize. The old code had the same behavior, so this is not a regression — but worth noting that the method
  returns at least requestSize bytes, not exactly.
  2. Potential IndexOutOfBoundsException (bug)
  The while loop increments count and reads batches.get(count) without checking count < batches.size(). If requestSize > totalSize but doesn't hit the >= guard (this shouldn't happen if totalSize is maintained correctly), or if totalSize drifts
   out of sync with actual batch sizes, this will throw. The old code had the same latent issue (batches.remove(0) on an empty list), so this is pre-existing — but now would be a good time to add a bounds check:                                 
  while (currentSize < requestSize && count < batches.size()) {
  3. Return type is ArrayList<DataBatch> (minor, style)                                                                                                                                                                                             
  The method signature returns ArrayList<DataBatch> rather than List<DataBatch>. Since the new code already introduces a List<DataBatch> import for the subList result, the return type could be widened to List<DataBatch> — though this would be a
   separate API change and is fine to defer.                         
  4. The fast-path change is a behavioral fix, not just an optimization
  The old code returned the same batches reference and set totalSize = 0 but never cleared or replaced batches. Any subsequent call to addDataBatch() would append to the list the caller is still holding. The new code (batches = new
  ArrayList<>()) properly detaches. This should be called out in the PR description — it's a correctness fix, not just a perf change.

  Performance

  The optimization is sound:

  ┌──────────────────────────┬────────┬─────────────────────────────────┐
  │        Operation         │  Old   │               New               │
  ├──────────────────────────┼────────┼─────────────────────────────────┤
  │ Removing k of n elements │ O(k·n) │ O(n) (subList copy + arraycopy) │
  ├──────────────────────────┼────────┼─────────────────────────────────┤
  │ Fast path (return all)   │ O(1)   │ O(1)                            │
  └──────────────────────────┴────────┴─────────────────────────────────┘

  For large batch lists, this is a meaningful improvement on the write-side hot path.

  Test Coverage

  No new tests are added. I'd recommend:                                                                                                                                                                                                            
  - A test for partial requireBatches(requestSize) verifying correct elements are returned and batches retains the remainder.
  - A test for the full-drain fast path ensuring the returned list is independent from the internal state (verifying the bugfix).                                                                                                                   
  - An edge case test where requestSize equals exactly one batch's size.
                                                                     
  Verdict

  Approve with minor comments. The core optimization is correct and addresses a real O(n²) bottleneck. The fast-path fix is a bonus correctness improvement. The only actionable suggestion is adding a bounds guard (count < batches.size()) in the while loop for defensive safety, and noting the behavioral fix in the PR description.

@RexXiong
Copy link
Copy Markdown
Contributor

RexXiong commented May 7, 2026

Overall LGTM, Could you add unit tests for requireBatches(int requestSize)?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants