Skip to content

Conversation

@hvanhovell
Copy link
Contributor

What changes were proposed in this pull request?

This PR fixes a memory leak in Spark Connect LocalRelations.

... more details TBD ...

Why are the changes needed?

It fixes a stability issue.

Does this PR introduce any user-facing change?

No

How was this patch tested?

Existing tests.
A Connect Planner Test TBD
Longevity tests.

Was this patch authored or co-authored using generative AI tooling?

No.


} else {
if (batchStructType != structType) {
throw InvalidInputErrors.chunkedCachedLocalRelationChunksWithDifferentSchema()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An error like this is thrown in the iterator. We may want to make this nicer though...

}
combinedRows = combinedRows ++ batchRows
}
val (rows, structType) = ArrowConverters.fromIPCStream(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can move this code into buildLocalRelationFromRows now...

val messages = ipcStreams.map { bytes =>
new MessageIterator(new ByteArrayInputStream(bytes), allocator)
}
new ConcatenatingArrowStreamReader(allocator, messages, destructive = true)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is reusing a component that was used in the Spark Connect Scala client. It allows us to concatenate multiple IPC streams.

resources.append(reader)

private val root: VectorSchemaRoot = try {
reader.getVectorSchemaRoot
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The read owns the vector schema root. We don't have to manage that.

@hvanhovell hvanhovell changed the title [WIP][CONNECT] Clean-up ArrowBuffers in Connect [SPARK-54696][CONNECT] Clean-up ArrowBuffers in Connect Dec 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant