Skip to content

Conversation

@dogakarakas
Copy link

@dogakarakas dogakarakas commented Jan 27, 2026

This PR implements the LibMatrixReorg.transposeInPlaceTensor()(src/main/java/org/apache/sysds/runtime/matrix/data/LibMatrixReorg.java) functionality, facilitating in-place permutations for tensors linearized within a MatrixBlock. The implementation is inspired by the EITHOT algorithm (Efficient In-Place Transposition for High-Order Tensors) and focuses on minimizing memory overhead by avoiding full-copy allocations.

Implementation Details:
The implementation handles permutations by identifying and applying fundamental primitive patterns:

Primitive 21 (Tensor -> Matrix): When the permutation allows for a split index where both sub-permutations maintain their internal order. Dimensions in each sub-permutation gets multiplied and build the new row and column dimensions. The logic leverages the highly optimized LibMatrixReorg.transposeInPlaceDenseBrenner() method. This ensures peak performance for tensors which could be reduced to matrices.

Primitive 1324 (General Permutation): For more complex high-order permutations, the algorithm resolves the transposition by swapping neighboring data blocks while maintaining the rest of the dimensions fixed and applying a cycle-following algorithm.

Cycle Tracking: To maximize generalizability and reduce implementation complexity, a robust cycle-tracking strategy was utilized. While the EITHOT paper suggests Catanzaro's algorithm for certain index calculations due to efficiency in specific cases, this implementation utilizes a simplified conversion strategy to ensure generalizability.

Parallelization Potential: The block-based structure is highly amenable to future GPU acceleration. References to EITHOT's parallelization strategies could be accessed following the link below.

Testing Framework:
Test Location: src/test/java/org/apache/sysds/test/component/matrix/libMatrixReorg/TransposeInPlaceBrennerTest.java.

Arbitrary Permutations: Verified the algorithm against a wide range of high-dimensional tensor shapes and arbitrary permutation vectors.

Memory Constraints in Validation: While the transposeInPlaceTensor() function is memory-efficient (buffers in block sizes), the test suite utilizes an out-of-place reference implementation to verify correctness. Consequently, tests involving extremely large tensor dimensions may trigger a java.lang.OutOfMemoryError: Java heap space due to the memory requirements of the reference copy.

Scope: Validation currently focuses on verifying permutation logic accuracy across high-order dimensions and varying shapes within standard heap limits.

Validation Logic: A helper method, compareTensorValues(), was implemented within the test component src/test/java/org/apache/sysds/test/TestUtils.java, to compare each cell value.

Link for the article: https://www.semanticscholar.org/paper/EITHOT%3A-Efficient-In-place-Transposition-of-High-on-Wu-Tu/cf4c177a64e1e271ccf1b742b5cc2efdb77fda9b

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

1 participant