Tensor permutation in-place on linearized data #2412

dogakarakas · 2026-01-27T15:30:10Z

This PR implements the LibMatrixReorg.transposeInPlaceTensor()(src/main/java/org/apache/sysds/runtime/matrix/data/LibMatrixReorg.java) functionality, facilitating in-place permutations for tensors linearized within a MatrixBlock. The implementation is inspired by the EITHOT algorithm (Efficient In-Place Transposition for High-Order Tensors) and focuses on minimizing memory overhead by avoiding full-copy allocations.

Implementation Details:
The implementation handles permutations by identifying and applying fundamental primitive patterns:

Primitive 21 (Tensor -> Matrix): When the permutation allows for a split index where both sub-permutations maintain their internal order. Dimensions in each sub-permutation gets multiplied and build the new row and column dimensions. The logic leverages the highly optimized LibMatrixReorg.transposeInPlaceDenseBrenner() method. This ensures peak performance for tensors which could be reduced to matrices.

Primitive 1324 (General Permutation): For more complex high-order permutations, the algorithm resolves the transposition by swapping neighboring data blocks while maintaining the rest of the dimensions fixed and applying a cycle-following algorithm.

Cycle Tracking: To maximize generalizability and reduce implementation complexity, a robust cycle-tracking strategy was utilized. While the EITHOT paper suggests Catanzaro's algorithm for certain index calculations due to efficiency in specific cases, this implementation utilizes a simplified conversion strategy to ensure generalizability.

Parallelization Potential: The block-based structure is highly amenable to future GPU acceleration. References to EITHOT's parallelization strategies could be accessed following the link below.

Testing Framework:
Test Location: src/test/java/org/apache/sysds/test/component/matrix/libMatrixReorg/TransposeInPlaceBrennerTest.java.

Arbitrary Permutations: Verified the algorithm against a wide range of high-dimensional tensor shapes and arbitrary permutation vectors.

Memory Constraints in Validation: While the transposeInPlaceTensor() function is memory-efficient (buffers in block sizes), the test suite utilizes an out-of-place reference implementation to verify correctness. Consequently, tests involving extremely large tensor dimensions may trigger a java.lang.OutOfMemoryError: Java heap space due to the memory requirements of the reference copy.

Scope: Validation currently focuses on verifying permutation logic accuracy across high-order dimensions and varying shapes within standard heap limits.

Validation Logic: A helper method, compareTensorValues(), was implemented within the test component src/test/java/org/apache/sysds/test/TestUtils.java, to compare each cell value.

Link for the article: https://www.semanticscholar.org/paper/EITHOT%3A-Efficient-In-place-Transposition-of-High-on-Wu-Tu/cf4c177a64e1e271ccf1b742b5cc2efdb77fda9b

[MINOR] Implement in-place transpose for tensors

72760f8

github-project-automation bot added this to SystemDS PR Queue Jan 27, 2026

github-project-automation bot moved this to In Progress in SystemDS PR Queue Jan 27, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tensor permutation in-place on linearized data #2412

Tensor permutation in-place on linearized data #2412

dogakarakas commented Jan 27, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Tensor permutation in-place on linearized data #2412

Are you sure you want to change the base?

Tensor permutation in-place on linearized data #2412

Conversation

dogakarakas commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

dogakarakas commented Jan 27, 2026 •

edited

Loading