Skip to content

Conversation

@pan3793
Copy link
Member

@pan3793 pan3793 commented Dec 12, 2025

What changes were proposed in this pull request?

Why are the changes needed?

For both security and performance.

Does this PR introduce any user-facing change?

No, except for performance.

How was this patch tested?

GHA for functionality, benchmark for performance.

TL;DR - my test results show lz4-java 1.10.1 is about 10~15% slower on lz4 compression than 1.8.0, and is about ~5% slower on lz4 decompression even with migrating to suggested safeDecompressor

Was this patch authored or co-authored using generative AI tooling?

No.

@github-actions github-actions bot added the CORE label Dec 12, 2025
@pan3793 pan3793 changed the title WIP [SPARK-54571] Use safeDecompressor WIP [SPARK-54571] Use LZ4 safeDecompressor Dec 12, 2025
@pan3793
Copy link
Member Author

pan3793 commented Dec 14, 2025

the test failure is caused by com.ibm.db2:jcc includes unshaded old lz4 classes

@pan3793
Copy link
Member Author

pan3793 commented Dec 15, 2025

cc @dbtsai @huaxingao, com.ibm.db2:jcc includes unshaded old lz4 classes, which causes sql/hive/docker-it modules test failure after this patch

java.lang.NoSuchMethodError: 'net.jpountz.lz4.LZ4BlockInputStream$Builder net.jpountz.lz4.LZ4BlockInputStream.newBuilder()'
 	at org.apache.spark.io.LZ4CompressionCodec.compressedInputStream(CompressionCodec.scala:156)
 	...

I checked all versions available in Maven Central, all of them have the same issue.
https://mvnrepository.com/artifact/com.ibm.db2/jcc

I don't find the public contact info of the IBM DB2 JDBC driver team, not sure what's the next step, temporarily purge the dependency and disable DB2 tests? Or any better ideas?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant