Skip to content

Add stream-dse fused SwiGLU-prefill operator#122

Open
asyms wants to merge 8 commits into
amd:develfrom
KULeuven-MICAS:stream-dse-fused-swiglu
Open

Add stream-dse fused SwiGLU-prefill operator#122
asyms wants to merge 8 commits into
amd:develfrom
KULeuven-MICAS:stream-dse-fused-swiglu

Conversation

@asyms

@asyms asyms commented Jun 18, 2026

Copy link
Copy Markdown

Adds SwiGLUPrefillStream, a fused SwiGLU-prefill operator whose single MLIR design (gate/up GEMMs + SiLU + elementwise-mul + down GEMM) is generated by stream-dse and compiled to one xclbin, instead of chaining separately-compiled sub-operators.

Its per-kernel operand layouts (the tiled-strided DMA tiling) are authored on the IRON side and injected into stream-dse code generation via optimize_allocation_co(kernels=...) — so IRON owns the layouts while stream keeps kernel construction and the MLIR rewrite, instead of the layouts being hand-copied on both sides.

Added

  • SwiGLUPrefillStream (iron/operators/swiglu_prefill_stream/): fused stream-dse design → one xclbin; MLIR generated at build time by stream_design.py.
  • iron.common.layout: a TiledStridedLayout type (with to_snaxc()) for handing IRON-authored operand layouts to stream-dse.
  • stream_kernels.py: injects IRON's operand layouts into codegen through the kernels= override, replacing only operand_layouts() on stream's own kernels (requires stream-dse ≥ 1.13.4).
  • requirements_stream.txt (optional dependency stream-dse>=1.13.4); the operator's test skips when stream-dse is absent.
  • Minimal demo under demos/swiglu_prefill_stream/.

Changed

  • Importing iron.operators no longer requires an NPU runtime: lazy XRT/pyxrt import and PEP 562 lazy operator exports, so the package loads (and tests collect) on hosts without XRT/pyxrt.

Removed

  • None.

Running the demo

Prerequisites: the XDNA driver + XRT installed (/opt/xilinx/xrt) and an npu2 device. From a fresh clone of this branch:

python3 -m venv .venv && source .venv/bin/activate
source /opt/xilinx/xrt/setup.sh            # provides pyxrt
pip install --upgrade pip
pip install -r requirements.txt            # IRON + mlir_aie/llvm-aie toolchain + torch
pip install -r requirements_stream.txt     # stream-dse>=1.13.4 (PyPI)
stream-setup-aie                           # required: installs snaxc / xdsl-aie / aie-python-extras
python demos/swiglu_prefill_stream/demo.py

This generates the fused design with stream-dse, compiles it to an xclbin, and runs it once on the NPU (≈2 ms for the 256×512×2048 shape). stream-setup-aie is required: it installs the AIE codegen packages stream-dse needs that cannot be plain PyPI dependencies.

Licensing note

The new IRON-side files — iron/common/layout.py, iron/operators/swiglu_prefill_stream/stream_kernels.py, and demos/swiglu_prefill_stream/demo.py — carry a KU Leuven (MICAS) copyright header (Apache-2.0), as they were authored by MICAS; all other touched files keep their existing AMD headers. We can discuss this further.

PR Merge Checklist

  1. The PR is rebased on the latest devel commit and pointing to devel.
  2. Your PR has been reviewed and approved.
  3. All checks are passing.

@andrej

andrej commented Jun 22, 2026

Copy link
Copy Markdown
Collaborator

Hi Arne, sorry for the CI failures, if you rebase on #125 maybe once it's merged hopefully these should pass

@asyms asyms force-pushed the stream-dse-fused-swiglu branch from 7f54de3 to 5103ce5 Compare June 22, 2026 19:55
@github-actions

github-actions Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

CI Test Results

21c9a18 (2026_06_30_16_58_21)

IRON - CI Summary

Examples

iron/applications/llama_3.2_1b
Test Krackan Status Krackan Phoenix Status Phoenix
test_llama_3_2_1b[llama_3.2_1b_prompt_1024_tokens_1] - - -
test_llama_3_2_1b[llama_3.2_1b_prompt_1024_tokens_40] - - -
test_llama_3_2_1b[llama_3.2_1b_prompt_13_tokens_1] - - -
test_llama_3_2_1b[llama_3.2_1b_prompt_13_tokens_40] - - -

Small

iron/operators/axpy
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_axpy[input_length_2048-num_aie_columns_1-tile_size_2048-scalar_factor_3.0] 172.24 381.78
test_axpy[input_length_2048-num_aie_columns_2-tile_size_1024-scalar_factor_3.0] 178.68 444.24
test_axpy[input_length_2048-num_aie_columns_4-tile_size_512-scalar_factor_3.0] 200.44 368.60
test_axpy[input_length_2048-num_aie_columns_8-tile_size_256-scalar_factor_3.0] 276.54 - -
iron/operators/dequant
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_dequant[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-group_size_32] 165.32 356.38
test_dequant[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-group_size_32] 185.50 405.20
test_dequant[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-group_size_32] 171.90 811.08
test_dequant[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-group_size_32] 212.80 368.10
test_dequant[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-group_size_32] 214.22 455.92
test_dequant[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-group_size_32] 196.02 678.18
test_dequant[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-group_size_32] 216.64 - -
test_dequant[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-group_size_32] 264.96 - -
iron/operators/elementwise_add
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_elementwise_add[input_length_2048-num_aie_columns_1-tile_size_2048] 185.42 386.06
test_elementwise_add[input_length_2048-num_aie_columns_2-tile_size_1024] 176.52 429.64
test_elementwise_add[input_length_2048-num_aie_columns_4-tile_size_512] 155.40 535.10
test_elementwise_add[input_length_2048-num_aie_columns_8-tile_size_256] 192.18 - -
iron/operators/elementwise_mul
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_elementwise_mul[input_length_2048-num_aie_columns_1-tile_size_2048] 159.42 319.34
test_elementwise_mul[input_length_2048-num_aie_columns_2-tile_size_1024] 170.32 457.84
test_elementwise_mul[input_length_2048-num_aie_columns_4-tile_size_512] 180.94 399.80
test_elementwise_mul[input_length_2048-num_aie_columns_8-tile_size_256] 205.66 - -
iron/operators/gelu
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_gelu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048] 165.00 368.38
test_gelu[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024] 171.68 398.12
test_gelu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024] 204.40 391.66
test_gelu[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512] 180.32 649.90
test_gelu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512] 173.38 519.74
test_gelu[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256] 174.46 538.64
test_gelu[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256] 173.78 - -
test_gelu[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128] 243.36 - -
iron/operators/gemm
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_gemm[M_1792-K_896-N_1152-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_64-k_32-n_48-trace_size_0-partition_N_1] 2306.92 - -
test_gemm[M_192-K_384-N_64-num_aie_columns_4-b_col_maj_False-c_col_maj_False-m_48-k_96-n_16-trace_size_0-partition_N_1] 273.48 548.58
test_gemm[M_192-K_384-N_64-num_aie_columns_4-b_col_maj_True-c_col_maj_True-m_48-k_96-n_16-trace_size_0-partition_N_1] 234.32 550.34
test_gemm[M_2048-K_2048-N_2048-num_aie_columns_1-b_col_maj_False-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1] 48838.44 82800.26
test_gemm[M_2048-K_2048-N_2048-num_aie_columns_2-b_col_maj_True-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1] 28507.46 25278.32
test_gemm[M_2048-K_2048-N_2048-num_aie_columns_8-b_col_maj_True-c_col_maj_True-m_64-k_64-n_64-trace_size_0-partition_N_1] 7874.90 - -
test_gemm[M_384-K_1536-N_1792-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_32-k_48-n_64-trace_size_0-partition_N_1] 2105.44 3482.64
test_gemm[M_64-K_512-N_256-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_16-k_64-n_64-trace_size_0-partition_N_4] 3722.06 6435.88
test_gemm[M_896-K_1792-N_640-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_32-k_64-n_80-trace_size_0-partition_N_1] 1460.80 - -
iron/operators/gemv
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_gemv[M_128-K_128-num_aie_columns_1-tile_size_input_32-tile_size_output_128] 0.21 0.10
test_gemv[M_2048-K_8192-num_aie_columns_1-tile_size_input_1-tile_size_output_2048] 12.13 3.65
test_gemv[M_2048-K_8192-num_aie_columns_2-tile_size_input_1-tile_size_output_1024] 23.96 6.05
test_gemv[M_2048-K_8192-num_aie_columns_4-tile_size_input_1-tile_size_output_512] 38.88 10.55
test_gemv[M_2048-K_8192-num_aie_columns_8-tile_size_input_1-tile_size_output_256] 41.81 - -
test_gemv[M_8192-K_2048-num_aie_columns_1-tile_size_input_4-tile_size_output_1024] 12.56 3.69
test_gemv[M_8192-K_2048-num_aie_columns_2-tile_size_input_4-tile_size_output_1024] 22.73 6.87
test_gemv[M_8192-K_2048-num_aie_columns_4-tile_size_input_4-tile_size_output_1024] 38.45 8.68
test_gemv[M_8192-K_2048-num_aie_columns_8-tile_size_input_4-tile_size_output_1024] 41.48 - -
iron/operators/layer_norm
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_layer_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048] 203.70 434.44
test_layer_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024] 160.64 752.66
test_layer_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024] 192.26 452.04
test_layer_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512] 196.60 372.20
test_layer_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512] 152.44 377.32
test_layer_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256] 196.60 411.00
test_layer_norm[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256] 170.60 - -
test_layer_norm[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128] 310.42 - -
iron/operators/mem_copy
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_mem_copy[input_length_2048-num_cores_1-num_channels_1-bypass_False-tile_size_2048] 164.78 353.78
test_mem_copy[input_length_2048-num_cores_16-num_channels_2-bypass_False-tile_size_128] 200.40 - -
test_mem_copy[input_length_2048-num_cores_2-num_channels_1-bypass_False-tile_size_1024] 179.12 665.14
test_mem_copy[input_length_2048-num_cores_2-num_channels_2-bypass_False-tile_size_1024] 190.54 442.38
test_mem_copy[input_length_2048-num_cores_4-num_channels_1-bypass_False-tile_size_512] 210.66 565.54
test_mem_copy[input_length_2048-num_cores_4-num_channels_2-bypass_False-tile_size_512] 182.68 351.26
test_mem_copy[input_length_2048-num_cores_8-num_channels_1-bypass_False-tile_size_256] 174.16 - -
test_mem_copy[input_length_2048-num_cores_8-num_channels_2-bypass_False-tile_size_256] 155.98 466.34
iron/operators/mha
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_mha[seq_len_16384-dim_64-num_heads_1-num_pipelines_8-num_kv_heads_0] 40651.46 - -
iron/operators/relu
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_relu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048] 162.56 335.24
test_relu[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024] 156.54 334.36
test_relu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024] 152.80 841.76
test_relu[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512] 162.06 373.84
test_relu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512] 204.72 326.88
test_relu[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256] 173.38 457.82
test_relu[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256] 163.56 - -
test_relu[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128] 202.06 - -
iron/operators/rms_norm
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_False] 165.52 322.24
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_True] 162.68 365.18
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_False] 147.48 368.48
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_True] 157.28 445.82
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_False] 177.52 366.30
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_True] 163.56 356.82
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_False] 209.44 443.98
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_True] 175.92 394.84
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_False] 181.82 430.58
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_True] 181.66 410.86
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_False] 191.74 461.22
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_True] 195.80 - -
test_rms_norm[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_False] 184.50 - -
test_rms_norm[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_True] 197.38 - -
test_rms_norm[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-weighted_False] 271.34 - -
iron/operators/rope
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_1-method_type_0] 140.92 434.44
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_2-method_type_0] 154.50 344.80
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_4-method_type_0] 183.64 494.56
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_8-method_type_0] 203.44 - -
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_1-method_type_0] 172.10 398.62
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_2-method_type_0] 142.64 367.50
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_4-method_type_0] 197.16 447.44
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_8-method_type_0] 230.50 - -
iron/operators/sigmoid
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_sigmoid[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048] 158.98 389.74
test_sigmoid[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024] 146.28 330.48
test_sigmoid[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024] 175.12 740.86
test_sigmoid[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512] 160.90 562.48
test_sigmoid[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512] 176.40 450.74
test_sigmoid[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256] 200.08 429.16
test_sigmoid[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256] 248.54 - -
test_sigmoid[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128] 191.34 - -
iron/operators/silu
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_silu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048] 155.66 523.64
test_silu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024] 161.84 309.46
test_silu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512] 173.44 468.46
test_silu[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256] 172.88 - -
iron/operators/softmax
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_1024] 189.90 374.92
test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_2048] 190.16 401.16
test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_512] 169.44 831.32
iron/operators/swiglu_decode
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_swiglu_decode[embedding_dim_1024-hidden_dim_3584] 3815.22 12094.96
test_swiglu_decode[embedding_dim_2048-hidden_dim_2048] 3887.89 11397.11
iron/operators/swiglu_prefill
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_swiglu_prefill[seq_len_256-embedding_dim_2048-hidden_dim_2048-prio_accuracy_False] 9136.39 21335.95
iron/operators/swiglu_prefill_stream
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_swiglu_prefill_stream[seq_len_256-embedding_dim_512-hidden_dim_2048-seq_tile_32-embedding_tile_32-hidden_tile_64] - - -
test_swiglu_prefill_stream_k2[seq_len_256-embedding_dim_512-hidden_dim_2048-seq_tile_32-embedding_tile_32-hidden_tile_64] - - -
iron/operators/tanh
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_tanh[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048] 144.30 309.88
test_tanh[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024] 160.02 555.90
test_tanh[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024] 167.40 452.92
test_tanh[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512] 163.30 397.06
test_tanh[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512] 170.34 342.82
test_tanh[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256] 187.76 481.90
test_tanh[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256] 234.62 - -
test_tanh[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128] 264.20 - -
iron/operators/transpose
Test Krackan Status Krackan Latency (mean) Phoenix Status Phoenix Latency (mean)
test_transpose[M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8-num_batches_1] 188.68 1004.28
test_transpose[M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8-num_batches_2] 229.66 1455.52
test_transpose[M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8-num_batches_1] 197.04 448.52
Krackan - Small

IRON

Tested on 2026_06_30_16_58_21 at commit 21c9a18.

iron/operators/axpy
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_axpy[input_length_2048-num_aie_columns_1-tile_size_2048-scalar_factor_3.0]✅ 5/5172.240.07n/a
test_axpy[input_length_2048-num_aie_columns_2-tile_size_1024-scalar_factor_3.0]✅ 5/5178.680.07n/a
test_axpy[input_length_2048-num_aie_columns_4-tile_size_512-scalar_factor_3.0]✅ 5/5200.440.06n/a
test_axpy[input_length_2048-num_aie_columns_8-tile_size_256-scalar_factor_3.0]✅ 5/5276.540.05n/a
iron/operators/dequant
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_dequant[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-group_size_32]✅ 5/5165.320.03n/a
test_dequant[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-group_size_32]✅ 5/5185.500.03n/a
test_dequant[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-group_size_32]✅ 5/5171.900.03n/a
test_dequant[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-group_size_32]✅ 5/5212.800.03n/a
test_dequant[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-group_size_32]✅ 5/5214.220.02n/a
test_dequant[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-group_size_32]✅ 5/5196.020.03n/a
test_dequant[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-group_size_32]✅ 5/5216.640.02n/a
test_dequant[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-group_size_32]✅ 5/5264.960.02n/a
iron/operators/elementwise_add
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_elementwise_add[input_length_2048-num_aie_columns_1-tile_size_2048]✅ 5/5185.420.07n/a
test_elementwise_add[input_length_2048-num_aie_columns_2-tile_size_1024]✅ 5/5176.520.07n/a
test_elementwise_add[input_length_2048-num_aie_columns_4-tile_size_512]✅ 5/5155.400.08n/a
test_elementwise_add[input_length_2048-num_aie_columns_8-tile_size_256]✅ 5/5192.180.06n/a
iron/operators/elementwise_mul
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_elementwise_mul[input_length_2048-num_aie_columns_1-tile_size_2048]✅ 5/5159.420.08n/a
test_elementwise_mul[input_length_2048-num_aie_columns_2-tile_size_1024]✅ 5/5170.320.07n/a
test_elementwise_mul[input_length_2048-num_aie_columns_4-tile_size_512]✅ 5/5180.940.07n/a
test_elementwise_mul[input_length_2048-num_aie_columns_8-tile_size_256]✅ 5/5205.660.06n/a
iron/operators/gelu
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_gelu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5165.000.05n/a
test_gelu[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]✅ 5/5171.680.05n/a
test_gelu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5204.400.04n/a
test_gelu[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5180.320.05n/a
test_gelu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5173.380.05n/a
test_gelu[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]✅ 5/5174.460.05n/a
test_gelu[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256]✅ 5/5173.780.05n/a
test_gelu[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128]✅ 5/5243.360.04n/a
iron/operators/gemm
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_gemm[M_1792-K_896-N_1152-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_64-k_32-n_48-trace_size_0-partition_N_1]✅ 5/52306.924.121619.51
test_gemm[M_192-K_384-N_64-num_aie_columns_4-b_col_maj_False-c_col_maj_False-m_48-k_96-n_16-trace_size_0-partition_N_1]✅ 5/5273.480.8937.78
test_gemm[M_192-K_384-N_64-num_aie_columns_4-b_col_maj_True-c_col_maj_True-m_48-k_96-n_16-trace_size_0-partition_N_1]✅ 5/5234.321.0343.95
test_gemm[M_2048-K_2048-N_2048-num_aie_columns_1-b_col_maj_False-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1]✅ 5/548838.440.52351.77
test_gemm[M_2048-K_2048-N_2048-num_aie_columns_2-b_col_maj_True-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1]✅ 5/528507.460.88602.67
test_gemm[M_2048-K_2048-N_2048-num_aie_columns_8-b_col_maj_True-c_col_maj_True-m_64-k_64-n_64-trace_size_0-partition_N_1]✅ 5/57874.903.202182.11
test_gemm[M_384-K_1536-N_1792-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_32-k_48-n_64-trace_size_0-partition_N_1]✅ 5/52105.443.841006.92
test_gemm[M_64-K_512-N_256-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_16-k_64-n_64-trace_size_0-partition_N_4]✅ 5/53722.060.3518.71
test_gemm[M_896-K_1792-N_640-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_32-k_64-n_80-trace_size_0-partition_N_1]✅ 5/51460.804.621426.76
iron/operators/gemv
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_gemv[M_128-K_128-num_aie_columns_1-tile_size_input_32-tile_size_output_128]✅ 5/5n/a0.210.21
test_gemv[M_2048-K_8192-num_aie_columns_1-tile_size_input_1-tile_size_output_2048]✅ 5/5n/a12.1312.12
test_gemv[M_2048-K_8192-num_aie_columns_2-tile_size_input_1-tile_size_output_1024]✅ 5/5n/a23.9623.94
test_gemv[M_2048-K_8192-num_aie_columns_4-tile_size_input_1-tile_size_output_512]✅ 5/5n/a38.8838.85
test_gemv[M_2048-K_8192-num_aie_columns_8-tile_size_input_1-tile_size_output_256]✅ 5/5n/a41.8141.78
test_gemv[M_8192-K_2048-num_aie_columns_1-tile_size_input_4-tile_size_output_1024]✅ 5/5n/a12.5612.55
test_gemv[M_8192-K_2048-num_aie_columns_2-tile_size_input_4-tile_size_output_1024]✅ 5/5n/a22.7322.72
test_gemv[M_8192-K_2048-num_aie_columns_4-tile_size_input_4-tile_size_output_1024]✅ 5/5n/a38.4538.43
test_gemv[M_8192-K_2048-num_aie_columns_8-tile_size_input_4-tile_size_output_1024]✅ 5/5n/a41.4841.46
iron/operators/layer_norm
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_layer_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5203.700.04n/a
test_layer_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]✅ 5/5160.640.05n/a
test_layer_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5192.260.04n/a
test_layer_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5196.600.05n/a
test_layer_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5152.440.05n/a
test_layer_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]✅ 5/5196.600.05n/a
test_layer_norm[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256]✅ 5/5170.600.05n/a
test_layer_norm[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128]✅ 5/5310.420.03n/a
iron/operators/mem_copy
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_mem_copy[input_length_2048-num_cores_1-num_channels_1-bypass_False-tile_size_2048]✅ 5/5164.780.05n/a
test_mem_copy[input_length_2048-num_cores_16-num_channels_2-bypass_False-tile_size_128]✅ 5/5200.400.04n/a
test_mem_copy[input_length_2048-num_cores_2-num_channels_1-bypass_False-tile_size_1024]✅ 5/5179.120.05n/a
test_mem_copy[input_length_2048-num_cores_2-num_channels_2-bypass_False-tile_size_1024]✅ 5/5190.540.05n/a
test_mem_copy[input_length_2048-num_cores_4-num_channels_1-bypass_False-tile_size_512]✅ 5/5210.660.04n/a
test_mem_copy[input_length_2048-num_cores_4-num_channels_2-bypass_False-tile_size_512]✅ 5/5182.680.05n/a
test_mem_copy[input_length_2048-num_cores_8-num_channels_1-bypass_False-tile_size_256]✅ 5/5174.160.05n/a
test_mem_copy[input_length_2048-num_cores_8-num_channels_2-bypass_False-tile_size_256]✅ 5/5155.980.05n/a
iron/operators/mha
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_mha[seq_len_16384-dim_64-num_heads_1-num_pipelines_8-num_kv_heads_0]✅ 5/540651.460.21n/a
iron/operators/relu
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_relu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5162.560.05n/a
test_relu[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]✅ 5/5156.540.05n/a
test_relu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5152.800.06n/a
test_relu[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5162.060.05n/a
test_relu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5204.720.05n/a
test_relu[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]✅ 5/5173.380.05n/a
test_relu[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256]✅ 5/5163.560.05n/a
test_relu[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128]✅ 5/5202.060.05n/a
iron/operators/rms_norm
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_False]✅ 5/5165.520.05n/a
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_True]✅ 5/5162.680.08n/a
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_False]✅ 5/5147.480.06n/a
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_True]✅ 5/5157.280.07n/a
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_False]✅ 5/5177.520.05n/a
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_True]✅ 5/5163.560.06n/a
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_False]✅ 5/5209.440.04n/a
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_True]✅ 5/5175.920.05n/a
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_False]✅ 5/5181.820.05n/a
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_True]✅ 5/5181.660.05n/a
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_False]✅ 5/5191.740.04n/a
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_True]✅ 5/5195.800.05n/a
test_rms_norm[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_False]✅ 5/5184.500.05n/a
test_rms_norm[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_True]✅ 5/5197.380.04n/a
test_rms_norm[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-weighted_False]✅ 5/5271.340.03n/a
iron/operators/rope
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_1-method_type_0]✅ 5/5140.920.71n/a
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_2-method_type_0]✅ 5/5154.500.64n/a
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_4-method_type_0]✅ 5/5183.640.56n/a
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_8-method_type_0]✅ 5/5203.440.49n/a
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_1-method_type_0]✅ 5/5172.100.44n/a
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_2-method_type_0]✅ 5/5142.640.53n/a
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_4-method_type_0]✅ 5/5197.160.39n/a
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_8-method_type_0]✅ 5/5230.500.37n/a
iron/operators/sigmoid
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_sigmoid[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5158.980.05n/a
test_sigmoid[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]✅ 5/5146.280.06n/a
test_sigmoid[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5175.120.05n/a
test_sigmoid[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5160.900.05n/a
test_sigmoid[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5176.400.05n/a
test_sigmoid[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]✅ 5/5200.080.04n/a
test_sigmoid[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256]✅ 5/5248.540.04n/a
test_sigmoid[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128]✅ 5/5191.340.04n/a
iron/operators/silu
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_silu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5155.660.05n/a
test_silu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5161.840.05n/a
test_silu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5173.440.05n/a
test_silu[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256]✅ 5/5172.880.05n/a
iron/operators/softmax
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_1024]✅ 5/5189.900.69n/a
test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_2048]✅ 5/5190.160.69n/a
test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5169.440.78n/a
iron/operators/swiglu_decode
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_swiglu_decode[embedding_dim_1024-hidden_dim_3584]✅ 5/53815.220.00n/a
test_swiglu_decode[embedding_dim_2048-hidden_dim_2048]✅ 5/53887.890.00n/a
iron/operators/swiglu_prefill
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_swiglu_prefill[seq_len_256-embedding_dim_2048-hidden_dim_2048-prio_accuracy_False]✅ 5/59136.390.24n/a
iron/operators/swiglu_prefill_stream
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_swiglu_prefill_stream[seq_len_256-embedding_dim_512-hidden_dim_2048-seq_tile_32-embedding_tile_32-hidden_tile_64]❌ 0/5n/an/an/a
test_swiglu_prefill_stream_k2[seq_len_256-embedding_dim_512-hidden_dim_2048-seq_tile_32-embedding_tile_32-hidden_tile_64]❌ 0/5n/an/an/a
iron/operators/tanh
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_tanh[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5144.300.06n/a
test_tanh[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]✅ 5/5160.020.05n/a
test_tanh[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5167.400.05n/a
test_tanh[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5163.300.05n/a
test_tanh[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5170.340.05n/a
test_tanh[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]✅ 5/5187.760.04n/a
test_tanh[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256]✅ 5/5234.620.04n/a
test_tanh[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128]✅ 5/5264.200.03n/a
iron/operators/transpose
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_transpose[M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8-num_batches_1]✅ 5/5188.682.84n/a
test_transpose[M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8-num_batches_2]✅ 5/5229.664.63n/a
test_transpose[M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8-num_batches_1]✅ 5/5197.042.72n/a

Trends:

IRON Trends

iron/operators/axpy

test_axpy[input_length_2048-num_aie_columns_1-tile_size_2048-scalar_factor_3.0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.09 (-11.03%)0.07 (+10.17%)0.07 (-4.24%)0.06 (+78.26%)0.01 (-48.14%)212.30 (-43.91%)172.24 (-20.33%)181.10 (+4.44%)135.80 (+12.42%)32.13 (-68.88%)
573678d — 2026-06-29 18:46:460.10 (n/a)0.07 (n/a)0.07 (n/a)0.03 (n/a)0.03 (n/a)378.50 (n/a)216.20 (n/a)173.40 (n/a)120.80 (n/a)103.24 (n/a)

test_axpy[input_length_2048-num_aie_columns_2-tile_size_1024-scalar_factor_3.0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.08 (-28.15%)0.07 (-9.90%)0.07 (-4.96%)0.06 (+5.02%)0.01 (-66.31%)191.80 (-4.77%)178.68 (+7.10%)184.70 (+5.24%)153.10 (+39.18%)16.08 (-52.79%)
573678d — 2026-06-29 18:46:460.11 (n/a)0.08 (n/a)0.07 (n/a)0.06 (n/a)0.02 (n/a)201.40 (n/a)166.84 (n/a)175.50 (n/a)110.00 (n/a)34.07 (n/a)

test_axpy[input_length_2048-num_aie_columns_4-tile_size_512-scalar_factor_3.0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.08 (-20.82%)0.06 (-12.30%)0.07 (-9.65%)0.05 (+24.61%)0.01 (-43.21%)264.80 (-19.76%)200.44 (+5.05%)185.60 (+10.67%)150.40 (+26.28%)47.80 (-43.09%)
573678d — 2026-06-29 18:46:460.10 (n/a)0.07 (n/a)0.07 (n/a)0.04 (n/a)0.03 (n/a)330.00 (n/a)190.80 (n/a)167.70 (n/a)119.10 (n/a)83.99 (n/a)

test_axpy[input_length_2048-num_aie_columns_8-tile_size_256-scalar_factor_3.0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.06 (-19.63%)0.05 (-21.63%)0.05 (-15.14%)0.03 (-8.34%)0.01 (-23.81%)352.20 (+9.11%)276.54 (+26.39%)247.60 (+17.85%)215.80 (+24.45%)63.84 (+4.67%)
573678d — 2026-06-29 18:46:460.07 (n/a)0.06 (n/a)0.06 (n/a)0.04 (n/a)0.01 (n/a)322.80 (n/a)218.80 (n/a)210.10 (n/a)173.40 (n/a)60.99 (n/a)
iron/operators/dequant

test_dequant[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.04 (+2.50%)0.03 (+1.67%)0.03 (+3.42%)0.03 (-8.39%)0.01 (+20.94%)209.80 (+9.16%)165.32 (-0.88%)155.90 (-3.35%)136.50 (-2.43%)28.74 (+27.95%)
573678d — 2026-06-29 18:46:460.04 (n/a)0.03 (n/a)0.03 (n/a)0.03 (n/a)0.00 (n/a)192.20 (n/a)166.78 (n/a)161.30 (n/a)139.90 (n/a)22.46 (n/a)

test_dequant[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.04 (-13.00%)0.03 (-25.32%)0.03 (-25.69%)0.02 (-33.25%)0.01 (+29.96%)227.20 (+49.77%)185.50 (+36.94%)194.10 (+34.60%)131.60 (+14.93%)36.14 (+119.15%)
573678d — 2026-06-29 18:46:460.05 (n/a)0.04 (n/a)0.04 (n/a)0.03 (n/a)0.01 (n/a)151.70 (n/a)135.46 (n/a)144.20 (n/a)114.50 (n/a)16.49 (n/a)

test_dequant[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.03 (+2.52%)0.03 (+1.33%)0.03 (+5.35%)0.03 (+1.36%)0.00 (+21.97%)206.20 (-1.29%)171.90 (-0.85%)157.80 (-5.05%)150.90 (-2.46%)24.90 (+15.50%)
573678d — 2026-06-29 18:46:460.03 (n/a)0.03 (n/a)0.03 (n/a)0.03 (n/a)0.00 (n/a)208.90 (n/a)173.38 (n/a)166.20 (n/a)154.70 (n/a)21.55 (n/a)

test_dequant[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.05 (+22.57%)0.03 (-25.31%)0.02 (-39.50%)0.02 (-38.18%)0.01 (+198.19%)269.30 (+61.74%)212.80 (+49.40%)233.40 (+65.30%)102.70 (-18.43%)65.29 (+276.74%)
573678d — 2026-06-29 18:46:460.04 (n/a)0.04 (n/a)0.04 (n/a)0.03 (n/a)0.00 (n/a)166.50 (n/a)142.44 (n/a)141.20 (n/a)125.90 (n/a)17.33 (n/a)

test_dequant[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.03 (-29.04%)0.02 (-30.09%)0.02 (-34.24%)0.02 (-21.80%)0.00 (-36.05%)243.00 (+27.89%)214.22 (+42.09%)226.50 (+52.01%)166.10 (+40.88%)29.48 (+12.10%)
573678d — 2026-06-29 18:46:460.04 (n/a)0.04 (n/a)0.04 (n/a)0.03 (n/a)0.01 (n/a)190.00 (n/a)150.76 (n/a)149.00 (n/a)117.90 (n/a)26.30 (n/a)

test_dequant[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.03 (-26.18%)0.03 (-14.60%)0.03 (-6.92%)0.02 (-25.37%)0.01 (-27.20%)269.10 (+34.01%)196.02 (+16.90%)191.10 (+7.42%)160.20 (+35.42%)44.35 (+31.88%)
573678d — 2026-06-29 18:46:460.04 (n/a)0.03 (n/a)0.03 (n/a)0.03 (n/a)0.01 (n/a)200.80 (n/a)167.68 (n/a)177.90 (n/a)118.30 (n/a)33.63 (n/a)

test_dequant[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.03 (-13.90%)0.02 (-14.39%)0.02 (-12.50%)0.02 (-11.31%)0.00 (-27.75%)248.90 (+12.78%)216.64 (+15.85%)215.80 (+14.30%)174.50 (+16.18%)32.75 (-3.70%)
573678d — 2026-06-29 18:46:460.03 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)220.70 (n/a)187.00 (n/a)188.80 (n/a)150.20 (n/a)34.01 (n/a)

test_dequant[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.03 (-13.11%)0.02 (-23.01%)0.02 (-25.79%)0.01 (-38.76%)0.00 (+46.75%)363.70 (+63.24%)264.96 (+34.46%)257.60 (+34.73%)189.40 (+15.14%)65.12 (+174.10%)
573678d — 2026-06-29 18:46:460.03 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.00 (n/a)222.80 (n/a)197.06 (n/a)191.20 (n/a)164.50 (n/a)23.76 (n/a)
iron/operators/elementwise_add

test_elementwise_add[input_length_2048-num_aie_columns_1-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.08 (n/a)0.07 (n/a)0.07 (n/a)0.06 (n/a)0.01 (n/a)222.40 (n/a)185.42 (n/a)182.80 (n/a)158.10 (n/a)25.47 (n/a)

test_elementwise_add[input_length_2048-num_aie_columns_2-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.09 (n/a)0.07 (n/a)0.07 (n/a)0.06 (n/a)0.01 (n/a)218.30 (n/a)176.52 (n/a)183.80 (n/a)139.10 (n/a)31.58 (n/a)

test_elementwise_add[input_length_2048-num_aie_columns_4-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.11 (n/a)0.08 (n/a)0.08 (n/a)0.07 (n/a)0.02 (n/a)181.20 (n/a)155.40 (n/a)157.40 (n/a)116.00 (n/a)27.23 (n/a)

test_elementwise_add[input_length_2048-num_aie_columns_8-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.08 (n/a)0.06 (n/a)0.06 (n/a)0.06 (n/a)0.01 (n/a)220.60 (n/a)192.18 (n/a)197.20 (n/a)158.90 (n/a)23.48 (n/a)
iron/operators/elementwise_mul

test_elementwise_mul[input_length_2048-num_aie_columns_1-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.10 (n/a)0.08 (n/a)0.07 (n/a)0.06 (n/a)0.02 (n/a)192.00 (n/a)159.42 (n/a)165.50 (n/a)118.30 (n/a)31.65 (n/a)

test_elementwise_mul[input_length_2048-num_aie_columns_2-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.09 (n/a)0.07 (n/a)0.07 (n/a)0.06 (n/a)0.01 (n/a)193.90 (n/a)170.32 (n/a)168.30 (n/a)134.60 (n/a)23.53 (n/a)

test_elementwise_mul[input_length_2048-num_aie_columns_4-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.10 (n/a)0.07 (n/a)0.07 (n/a)0.05 (n/a)0.02 (n/a)236.30 (n/a)180.94 (n/a)182.10 (n/a)122.60 (n/a)50.66 (n/a)

test_elementwise_mul[input_length_2048-num_aie_columns_8-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.09 (n/a)0.06 (n/a)0.06 (n/a)0.04 (n/a)0.02 (n/a)279.50 (n/a)205.66 (n/a)203.70 (n/a)133.40 (n/a)51.69 (n/a)
iron/operators/gelu

test_gelu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.06 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)190.00 (n/a)165.00 (n/a)161.60 (n/a)127.40 (n/a)25.93 (n/a)

test_gelu[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.06 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)220.80 (n/a)171.68 (n/a)159.00 (n/a)142.80 (n/a)29.99 (n/a)

test_gelu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.05 (n/a)0.04 (n/a)0.04 (n/a)0.03 (n/a)0.01 (n/a)243.00 (n/a)204.40 (n/a)191.70 (n/a)169.90 (n/a)35.32 (n/a)

test_gelu[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.06 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)219.50 (n/a)180.32 (n/a)176.20 (n/a)135.60 (n/a)30.67 (n/a)

test_gelu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.05 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.00 (n/a)188.30 (n/a)173.38 (n/a)169.70 (n/a)162.40 (n/a)9.91 (n/a)

test_gelu[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.05 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)207.90 (n/a)174.46 (n/a)164.80 (n/a)153.40 (n/a)24.17 (n/a)

test_gelu[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.05 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.00 (n/a)191.40 (n/a)173.78 (n/a)174.30 (n/a)155.40 (n/a)13.08 (n/a)

test_gelu[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.05 (n/a)0.04 (n/a)0.04 (n/a)0.02 (n/a)0.01 (n/a)381.90 (n/a)243.36 (n/a)216.00 (n/a)172.60 (n/a)80.91 (n/a)
iron/operators/gemm

test_gemm[M_1792-K_896-N_1152-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_64-k_32-n_48-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
21c9a18 — 2026-06-30 16:52:454.71 (+8.18%)4.12 (+2.21%)4.10 (+1.87%)3.47 (-5.01%)0.45 (+77.21%)2713.50 (+5.27%)2306.92 (-1.52%)2293.60 (-1.83%)1996.30 (-7.56%)261.26 (+72.99%)1853.14 (+8.18%)1619.51 (+2.21%)1612.94 (+1.87%)1363.33 (-5.01%)176.53 (+77.21%)
573678d — 2026-06-29 18:46:464.35 (n/a)4.03 (n/a)4.03 (n/a)3.65 (n/a)0.25 (n/a)2577.60 (n/a)2342.44 (n/a)2336.40 (n/a)2159.60 (n/a)151.02 (n/a)1712.99 (n/a)1584.41 (n/a)1583.36 (n/a)1435.20 (n/a)99.61 (n/a)

test_gemm[M_192-K_384-N_64-num_aie_columns_4-b_col_maj_False-c_col_maj_False-m_48-k_96-n_16-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
21c9a18 — 2026-06-30 16:52:451.28 (+26.03%)0.89 (-7.59%)0.79 (-18.48%)0.57 (-37.69%)0.30 (+590.56%)389.30 (+60.47%)273.48 (+18.28%)281.40 (+22.67%)172.10 (-20.65%)89.19 (+756.51%)54.83 (+26.03%)37.78 (-7.59%)33.53 (-18.48%)24.24 (-37.69%)12.82 (+590.56%)
573678d — 2026-06-29 18:46:461.02 (n/a)0.96 (n/a)0.96 (n/a)0.91 (n/a)0.04 (n/a)242.60 (n/a)231.22 (n/a)229.40 (n/a)216.90 (n/a)10.41 (n/a)43.50 (n/a)40.88 (n/a)41.13 (n/a)38.91 (n/a)1.86 (n/a)

test_gemm[M_192-K_384-N_64-num_aie_columns_4-b_col_maj_True-c_col_maj_True-m_48-k_96-n_16-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
21c9a18 — 2026-06-30 16:52:451.26 (+9.56%)1.03 (+14.75%)1.11 (+30.69%)0.57 (-11.87%)0.29 (+43.07%)390.10 (+13.47%)234.32 (-8.75%)199.40 (-23.48%)176.00 (-8.76%)89.83 (+51.59%)53.61 (+9.56%)43.95 (+14.75%)47.33 (+30.69%)24.19 (-11.87%)12.20 (+43.07%)
573678d — 2026-06-29 18:46:461.15 (n/a)0.90 (n/a)0.85 (n/a)0.64 (n/a)0.20 (n/a)343.80 (n/a)256.80 (n/a)260.60 (n/a)192.90 (n/a)59.25 (n/a)48.93 (n/a)38.30 (n/a)36.22 (n/a)27.45 (n/a)8.53 (n/a)

test_gemm[M_2048-K_2048-N_2048-num_aie_columns_1-b_col_maj_False-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
21c9a18 — 2026-06-30 16:52:450.52 (-0.38%)0.52 (-0.38%)0.52 (-0.40%)0.51 (-0.38%)0.00 (+3.26%)48886.20 (+0.38%)48838.44 (+0.38%)48839.00 (+0.40%)48809.70 (+0.38%)30.60 (+4.16%)351.98 (-0.38%)351.77 (-0.38%)351.77 (-0.40%)351.43 (-0.38%)0.22 (+3.26%)
573678d — 2026-06-29 18:46:460.52 (n/a)0.52 (n/a)0.52 (n/a)0.52 (n/a)0.00 (n/a)48698.80 (n/a)48651.26 (n/a)48644.30 (n/a)48625.70 (n/a)29.38 (n/a)353.31 (n/a)353.12 (n/a)353.17 (n/a)352.78 (n/a)0.21 (n/a)

test_gemm[M_2048-K_2048-N_2048-num_aie_columns_2-b_col_maj_True-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
21c9a18 — 2026-06-30 16:52:450.89 (+0.57%)0.88 (+0.24%)0.88 (+0.08%)0.88 (+0.16%)0.01 (+28.53%)28736.60 (-0.16%)28507.46 (-0.23%)28545.70 (-0.08%)28183.20 (-0.56%)211.39 (+27.51%)609.58 (+0.57%)602.67 (+0.24%)601.84 (+0.08%)597.84 (+0.16%)4.49 (+28.53%)
573678d — 2026-06-29 18:46:460.89 (n/a)0.88 (n/a)0.88 (n/a)0.87 (n/a)0.01 (n/a)28783.60 (n/a)28574.08 (n/a)28568.40 (n/a)28342.90 (n/a)165.78 (n/a)606.14 (n/a)601.26 (n/a)601.36 (n/a)596.86 (n/a)3.49 (n/a)

test_gemm[M_2048-K_2048-N_2048-num_aie_columns_8-b_col_maj_True-c_col_maj_True-m_64-k_64-n_64-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
21c9a18 — 2026-06-30 16:52:453.29 (-4.56%)3.20 (-2.36%)3.18 (-2.17%)3.15 (-1.38%)0.06 (-44.52%)7991.20 (+1.40%)7874.90 (+2.36%)7921.70 (+2.22%)7651.50 (+4.77%)133.65 (-40.81%)2245.28 (-4.56%)2182.11 (-2.36%)2168.70 (-2.17%)2149.86 (-1.38%)37.64 (-44.52%)
573678d — 2026-06-29 18:46:463.45 (n/a)3.27 (n/a)3.25 (n/a)3.19 (n/a)0.10 (n/a)7880.60 (n/a)7693.12 (n/a)7749.80 (n/a)7302.80 (n/a)225.78 (n/a)2352.51 (n/a)2234.74 (n/a)2216.80 (n/a)2180.03 (n/a)67.84 (n/a)

test_gemm[M_384-K_1536-N_1792-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_32-k_48-n_64-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
21c9a18 — 2026-06-30 16:52:454.11 (-3.28%)3.84 (+11.98%)3.71 (+12.88%)3.64 (+32.35%)0.23 (-62.72%)2212.30 (-24.44%)2105.44 (-12.73%)2172.10 (-11.41%)1961.80 (+3.39%)124.62 (-70.79%)1077.55 (-3.28%)1006.92 (+11.98%)973.24 (+12.88%)955.54 (+32.35%)60.81 (-62.72%)
573678d — 2026-06-29 18:46:464.25 (n/a)3.43 (n/a)3.29 (n/a)2.75 (n/a)0.62 (n/a)2928.00 (n/a)2412.46 (n/a)2451.90 (n/a)1897.40 (n/a)426.57 (n/a)1114.14 (n/a)899.19 (n/a)862.16 (n/a)721.97 (n/a)163.11 (n/a)

test_gemm[M_64-K_512-N_256-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_16-k_64-n_64-trace_size_0-partition_N_4]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
21c9a18 — 2026-06-30 16:52:450.49 (-5.61%)0.35 (+1.62%)0.32 (+4.23%)0.28 (+7.13%)0.08 (-21.47%)4524.20 (-6.65%)3722.06 (-3.96%)3860.90 (-4.06%)2541.00 (+5.95%)722.90 (-24.48%)26.41 (-5.61%)18.71 (+1.62%)17.38 (+4.23%)14.83 (+7.13%)4.45 (-21.47%)
573678d — 2026-06-29 18:46:460.52 (n/a)0.34 (n/a)0.31 (n/a)0.26 (n/a)0.11 (n/a)4846.70 (n/a)3875.42 (n/a)4024.20 (n/a)2398.40 (n/a)957.27 (n/a)27.98 (n/a)18.42 (n/a)16.68 (n/a)13.85 (n/a)5.67 (n/a)

test_gemm[M_896-K_1792-N_640-num_aie_columns_8-b_col_maj_False-c_col_maj_True-m_32-k_64-n_80-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
21c9a18 — 2026-06-30 16:52:455.17 (+4.20%)4.62 (+2.85%)4.75 (+1.80%)3.68 (+4.67%)0.57 (+2.59%)1807.60 (-4.46%)1460.80 (-2.84%)1401.60 (-1.77%)1286.80 (-4.03%)205.07 (-6.98%)1597.12 (+4.20%)1426.76 (+2.85%)1466.31 (+1.80%)1136.98 (+4.67%)177.60 (+2.59%)
573678d — 2026-06-29 18:46:464.96 (n/a)4.49 (n/a)4.66 (n/a)3.52 (n/a)0.56 (n/a)1892.00 (n/a)1503.56 (n/a)1426.90 (n/a)1340.90 (n/a)220.47 (n/a)1532.72 (n/a)1387.16 (n/a)1440.38 (n/a)1086.24 (n/a)173.12 (n/a)
iron/operators/gemv

test_gemv[M_128-K_128-num_aie_columns_1-tile_size_input_32-tile_size_output_128]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
21c9a18 — 2026-06-30 16:52:450.25 (+24.26%)0.21 (+18.05%)0.20 (+6.16%)0.16 (+33.13%)0.04 (+16.94%)0.25 (+24.26%)0.21 (+18.05%)0.20 (+6.16%)0.16 (+33.13%)0.04 (+16.94%)
573678d — 2026-06-29 18:46:460.20 (n/a)0.18 (n/a)0.19 (n/a)0.12 (n/a)0.03 (n/a)0.20 (n/a)0.17 (n/a)0.19 (n/a)0.12 (n/a)0.03 (n/a)

test_gemv[M_2048-K_8192-num_aie_columns_1-tile_size_input_1-tile_size_output_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
21c9a18 — 2026-06-30 16:52:4513.27 (+0.04%)12.13 (-3.05%)12.78 (+1.92%)10.21 (-14.29%)1.35 (+155.07%)13.26 (+0.04%)12.12 (-3.05%)12.77 (+1.92%)10.20 (-14.29%)1.35 (+155.07%)
573678d — 2026-06-29 18:46:4613.26 (n/a)12.51 (n/a)12.54 (n/a)11.91 (n/a)0.53 (n/a)13.25 (n/a)12.50 (n/a)12.53 (n/a)11.90 (n/a)0.53 (n/a)

test_gemv[M_2048-K_8192-num_aie_columns_2-tile_size_input_1-tile_size_output_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
21c9a18 — 2026-06-30 16:52:4524.50 (+2.04%)23.96 (+1.68%)24.20 (+1.41%)23.35 (+4.69%)0.50 (-30.09%)24.49 (+2.04%)23.94 (+1.68%)24.18 (+1.41%)23.33 (+4.69%)0.50 (-30.09%)
573678d — 2026-06-29 18:46:4624.01 (n/a)23.56 (n/a)23.86 (n/a)22.30 (n/a)0.71 (n/a)24.00 (n/a)23.55 (n/a)23.85 (n/a)22.29 (n/a)0.71 (n/a)

test_gemv[M_2048-K_8192-num_aie_columns_4-tile_size_input_1-tile_size_output_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
21c9a18 — 2026-06-30 16:52:4540.48 (+2.13%)38.88 (+3.91%)39.17 (-0.52%)36.29 (+18.39%)1.69 (-55.97%)40.45 (+2.13%)38.85 (+3.91%)39.15 (-0.52%)36.27 (+18.39%)1.69 (-55.97%)
573678d — 2026-06-29 18:46:4639.63 (n/a)37.41 (n/a)39.38 (n/a)30.65 (n/a)3.85 (n/a)39.61 (n/a)37.39 (n/a)39.35 (n/a)30.63 (n/a)3.84 (n/a)

test_gemv[M_2048-K_8192-num_aie_columns_8-tile_size_input_1-tile_size_output_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
21c9a18 — 2026-06-30 16:52:4544.87 (+0.47%)41.81 (-1.80%)41.70 (-3.74%)39.30 (+0.72%)2.42 (+9.57%)44.85 (+0.47%)41.78 (-1.80%)41.67 (-3.74%)39.28 (+0.72%)2.42 (+9.57%)
573678d — 2026-06-29 18:46:4644.67 (n/a)42.57 (n/a)43.32 (n/a)39.02 (n/a)2.21 (n/a)44.64 (n/a)42.55 (n/a)43.29 (n/a)39.00 (n/a)2.21 (n/a)

test_gemv[M_8192-K_2048-num_aie_columns_1-tile_size_input_4-tile_size_output_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
21c9a18 — 2026-06-30 16:52:4513.30 (+0.69%)12.56 (+0.82%)12.48 (+1.01%)12.13 (+1.36%)0.46 (+1.88%)13.30 (+0.69%)12.55 (+0.82%)12.48 (+1.01%)12.13 (+1.36%)0.46 (+1.88%)
573678d — 2026-06-29 18:46:4613.21 (n/a)12.46 (n/a)12.36 (n/a)11.97 (n/a)0.46 (n/a)13.20 (n/a)12.45 (n/a)12.35 (n/a)11.96 (n/a)0.46 (n/a)

test_gemv[M_8192-K_2048-num_aie_columns_2-tile_size_input_4-tile_size_output_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
21c9a18 — 2026-06-30 16:52:4524.54 (+0.85%)22.73 (-4.04%)24.06 (+0.86%)19.20 (-14.58%)2.25 (+218.34%)24.52 (+0.85%)22.72 (-4.04%)24.05 (+0.86%)19.19 (-14.58%)2.25 (+218.34%)
573678d — 2026-06-29 18:46:4624.33 (n/a)23.69 (n/a)23.86 (n/a)22.48 (n/a)0.71 (n/a)24.32 (n/a)23.67 (n/a)23.84 (n/a)22.46 (n/a)0.71 (n/a)

test_gemv[M_8192-K_2048-num_aie_columns_4-tile_size_input_4-tile_size_output_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
21c9a18 — 2026-06-30 16:52:4541.22 (+1.04%)38.45 (+9.22%)38.88 (+3.15%)36.37 (+47.34%)1.97 (-70.03%)41.20 (+1.04%)38.43 (+9.22%)38.86 (+3.15%)36.35 (+47.34%)1.97 (-70.03%)
573678d — 2026-06-29 18:46:4640.80 (n/a)35.21 (n/a)37.70 (n/a)24.68 (n/a)6.59 (n/a)40.77 (n/a)35.18 (n/a)37.67 (n/a)24.67 (n/a)6.58 (n/a)

test_gemv[M_8192-K_2048-num_aie_columns_8-tile_size_input_4-tile_size_output_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
21c9a18 — 2026-06-30 16:52:4543.12 (-5.26%)41.48 (-1.34%)41.93 (-0.67%)39.56 (+1.72%)1.36 (-46.46%)43.09 (-5.26%)41.46 (-1.34%)41.90 (-0.67%)39.53 (+1.72%)1.36 (-46.46%)
573678d — 2026-06-29 18:46:4645.51 (n/a)42.04 (n/a)42.21 (n/a)38.89 (n/a)2.55 (n/a)45.48 (n/a)42.02 (n/a)42.18 (n/a)38.86 (n/a)2.54 (n/a)
iron/operators/layer_norm

test_layer_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.05 (n/a)0.04 (n/a)0.04 (n/a)0.03 (n/a)0.01 (n/a)238.30 (n/a)203.70 (n/a)214.30 (n/a)164.20 (n/a)32.89 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.06 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)192.90 (n/a)160.64 (n/a)175.20 (n/a)127.00 (n/a)30.40 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.05 (n/a)0.04 (n/a)0.04 (n/a)0.04 (n/a)0.01 (n/a)226.40 (n/a)192.26 (n/a)189.70 (n/a)161.30 (n/a)26.98 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.06 (n/a)0.05 (n/a)0.06 (n/a)0.02 (n/a)0.02 (n/a)394.20 (n/a)196.60 (n/a)148.40 (n/a)142.90 (n/a)110.53 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.06 (n/a)0.05 (n/a)0.05 (n/a)0.05 (n/a)0.00 (n/a)174.50 (n/a)152.44 (n/a)149.30 (n/a)139.90 (n/a)13.56 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.07 (n/a)0.05 (n/a)0.05 (n/a)0.03 (n/a)0.02 (n/a)287.90 (n/a)196.60 (n/a)167.40 (n/a)115.00 (n/a)72.71 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.06 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)184.60 (n/a)170.60 (n/a)175.70 (n/a)143.60 (n/a)17.09 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.05 (n/a)0.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)405.90 (n/a)310.42 (n/a)345.30 (n/a)173.50 (n/a)100.55 (n/a)
iron/operators/mem_copy

test_mem_copy[input_length_2048-num_cores_1-num_channels_1-bypass_False-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.07 (+0.27%)0.05 (+1.38%)0.05 (+2.39%)0.04 (-14.04%)0.01 (+33.31%)226.70 (+16.32%)164.78 (+1.52%)166.60 (-2.34%)120.90 (-0.25%)45.31 (+46.27%)
573678d — 2026-06-29 18:46:460.07 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)194.90 (n/a)162.32 (n/a)170.60 (n/a)121.20 (n/a)30.98 (n/a)

test_mem_copy[input_length_2048-num_cores_16-num_channels_2-bypass_False-tile_size_128]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.05 (-3.08%)0.04 (+14.59%)0.04 (+20.06%)0.04 (+51.65%)0.01 (-53.19%)224.60 (-34.08%)200.40 (-18.40%)206.20 (-16.69%)163.70 (+3.22%)23.95 (-68.19%)
573678d — 2026-06-29 18:46:460.05 (n/a)0.04 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)340.70 (n/a)245.58 (n/a)247.50 (n/a)158.60 (n/a)75.27 (n/a)

test_mem_copy[input_length_2048-num_cores_2-num_channels_1-bypass_False-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.06 (+11.23%)0.05 (-1.42%)0.04 (-8.26%)0.04 (+6.73%)0.01 (+30.93%)203.60 (-6.30%)179.12 (+2.07%)188.20 (+9.04%)138.80 (-10.10%)27.22 (+9.36%)
573678d — 2026-06-29 18:46:460.05 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)217.30 (n/a)175.48 (n/a)172.60 (n/a)154.40 (n/a)24.89 (n/a)

test_mem_copy[input_length_2048-num_cores_2-num_channels_2-bypass_False-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.08 (+24.35%)0.05 (-11.50%)0.05 (-11.38%)0.03 (-45.21%)0.02 (+199.10%)321.10 (+82.55%)190.54 (+27.45%)165.20 (+12.84%)103.80 (-19.53%)80.60 (+345.83%)
573678d — 2026-06-29 18:46:460.06 (n/a)0.06 (n/a)0.06 (n/a)0.05 (n/a)0.01 (n/a)175.90 (n/a)149.50 (n/a)146.40 (n/a)129.00 (n/a)18.08 (n/a)

test_mem_copy[input_length_2048-num_cores_4-num_channels_1-bypass_False-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.06 (-2.62%)0.04 (-16.26%)0.05 (-8.53%)0.02 (-51.77%)0.01 (+141.83%)378.80 (+107.33%)210.66 (+33.08%)168.20 (+9.29%)145.30 (+2.69%)96.32 (+444.72%)
573678d — 2026-06-29 18:46:460.06 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)182.70 (n/a)158.30 (n/a)153.90 (n/a)141.50 (n/a)17.68 (n/a)

test_mem_copy[input_length_2048-num_cores_4-num_channels_2-bypass_False-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.07 (+19.58%)0.05 (-9.79%)0.04 (-18.81%)0.04 (-21.46%)0.01 (+190.25%)225.50 (+27.33%)182.68 (+15.85%)197.30 (+23.16%)120.00 (-16.38%)41.52 (+205.40%)
573678d — 2026-06-29 18:46:460.06 (n/a)0.05 (n/a)0.05 (n/a)0.05 (n/a)0.00 (n/a)177.10 (n/a)157.68 (n/a)160.20 (n/a)143.50 (n/a)13.60 (n/a)

test_mem_copy[input_length_2048-num_cores_8-num_channels_1-bypass_False-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.07 (+28.93%)0.05 (-1.95%)0.04 (-10.44%)0.04 (-15.05%)0.01 (+276.05%)205.30 (+17.72%)174.16 (+5.98%)183.60 (+11.68%)115.30 (-22.46%)35.08 (+228.50%)
573678d — 2026-06-29 18:46:460.06 (n/a)0.05 (n/a)0.05 (n/a)0.05 (n/a)0.00 (n/a)174.40 (n/a)164.34 (n/a)164.40 (n/a)148.70 (n/a)10.68 (n/a)

test_mem_copy[input_length_2048-num_cores_8-num_channels_2-bypass_False-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.08 (+36.58%)0.05 (+14.19%)0.05 (-2.91%)0.05 (+39.90%)0.01 (+50.13%)172.40 (-28.52%)155.98 (-12.07%)168.50 (+3.00%)106.30 (-26.79%)28.15 (-25.10%)
573678d — 2026-06-29 18:46:460.06 (n/a)0.05 (n/a)0.05 (n/a)0.03 (n/a)0.01 (n/a)241.20 (n/a)177.40 (n/a)163.60 (n/a)145.20 (n/a)37.58 (n/a)
iron/operators/mha

test_mha[seq_len_16384-dim_64-num_heads_1-num_pipelines_8-num_kv_heads_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.21 (+0.63%)0.21 (+0.67%)0.21 (+0.78%)0.21 (+0.57%)0.00 (+32.60%)40730.90 (-0.56%)40651.46 (-0.67%)40624.20 (-0.78%)40604.50 (-0.62%)54.55 (+30.92%)
573678d — 2026-06-29 18:46:460.21 (n/a)0.20 (n/a)0.20 (n/a)0.20 (n/a)0.00 (n/a)40961.20 (n/a)40924.92 (n/a)40941.50 (n/a)40859.60 (n/a)41.66 (n/a)
iron/operators/rms_norm

test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.07 (+35.05%)0.05 (+10.17%)0.05 (+5.84%)0.04 (-0.43%)0.01 (+203.71%)198.80 (+0.45%)165.52 (-6.94%)164.80 (-5.50%)119.90 (-25.99%)29.75 (+119.85%)
573678d — 2026-06-29 18:46:460.05 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.00 (n/a)197.90 (n/a)177.86 (n/a)174.40 (n/a)162.00 (n/a)13.53 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.09 (-7.78%)0.08 (-12.44%)0.07 (-15.37%)0.07 (-8.62%)0.01 (+1.47%)185.70 (+9.43%)162.68 (+14.47%)166.70 (+18.23%)132.10 (+8.46%)20.65 (+17.61%)
573678d — 2026-06-29 18:46:460.10 (n/a)0.09 (n/a)0.09 (n/a)0.07 (n/a)0.01 (n/a)169.70 (n/a)142.12 (n/a)141.00 (n/a)121.80 (n/a)17.56 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.07 (+3.61%)0.06 (+0.94%)0.05 (-2.29%)0.05 (-0.17%)0.01 (+29.55%)167.30 (+0.18%)147.48 (-0.11%)160.30 (+2.36%)119.50 (-3.47%)23.12 (+26.53%)
573678d — 2026-06-29 18:46:460.07 (n/a)0.06 (n/a)0.05 (n/a)0.05 (n/a)0.01 (n/a)167.00 (n/a)147.64 (n/a)156.60 (n/a)123.80 (n/a)18.28 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.08 (+16.62%)0.07 (+11.42%)0.07 (+14.91%)0.05 (+3.06%)0.01 (+57.58%)187.70 (-3.00%)157.28 (-9.59%)155.00 (-12.97%)130.40 (-14.27%)21.10 (+32.03%)
573678d — 2026-06-29 18:46:460.07 (n/a)0.06 (n/a)0.06 (n/a)0.05 (n/a)0.01 (n/a)193.50 (n/a)173.96 (n/a)178.10 (n/a)152.10 (n/a)15.98 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.07 (+1.76%)0.05 (-1.23%)0.04 (-1.06%)0.04 (-0.97%)0.01 (+0.86%)211.00 (+0.96%)177.52 (+1.19%)184.20 (+1.10%)125.50 (-1.72%)34.27 (-2.39%)
573678d — 2026-06-29 18:46:460.06 (n/a)0.05 (n/a)0.04 (n/a)0.04 (n/a)0.01 (n/a)209.00 (n/a)175.44 (n/a)182.20 (n/a)127.70 (n/a)35.11 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.08 (+1.52%)0.06 (-3.16%)0.06 (-9.47%)0.05 (+5.02%)0.01 (-20.12%)194.80 (-4.79%)163.56 (+2.12%)162.90 (+10.44%)131.00 (-1.50%)22.71 (-25.68%)
573678d — 2026-06-29 18:46:460.08 (n/a)0.07 (n/a)0.07 (n/a)0.05 (n/a)0.01 (n/a)204.60 (n/a)160.16 (n/a)147.50 (n/a)133.00 (n/a)30.56 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.06 (+5.39%)0.04 (-11.68%)0.04 (-2.26%)0.03 (-32.30%)0.01 (+67.04%)307.40 (+47.72%)209.44 (+21.71%)182.20 (+2.30%)129.70 (-5.12%)73.43 (+143.64%)
573678d — 2026-06-29 18:46:460.06 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)208.10 (n/a)172.08 (n/a)178.10 (n/a)136.70 (n/a)30.14 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.07 (-19.10%)0.05 (-16.77%)0.05 (-22.33%)0.04 (-22.59%)0.01 (-2.16%)211.40 (+29.22%)175.92 (+21.12%)185.40 (+28.75%)141.60 (+23.56%)30.06 (+53.82%)
573678d — 2026-06-29 18:46:460.08 (n/a)0.06 (n/a)0.06 (n/a)0.06 (n/a)0.01 (n/a)163.60 (n/a)145.24 (n/a)144.00 (n/a)114.60 (n/a)19.54 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.06 (-4.70%)0.05 (+0.57%)0.04 (-1.28%)0.04 (+14.39%)0.01 (-18.92%)215.50 (-12.58%)181.82 (-1.98%)184.80 (+1.32%)142.10 (+4.95%)30.30 (-25.52%)
573678d — 2026-06-29 18:46:460.06 (n/a)0.05 (n/a)0.04 (n/a)0.03 (n/a)0.01 (n/a)246.50 (n/a)185.50 (n/a)182.40 (n/a)135.40 (n/a)40.68 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.06 (+5.33%)0.05 (+4.53%)0.05 (-3.59%)0.04 (+41.00%)0.01 (-33.23%)204.90 (-29.08%)181.66 (-7.46%)184.90 (+3.70%)145.80 (-5.08%)21.90 (-58.74%)
573678d — 2026-06-29 18:46:460.06 (n/a)0.05 (n/a)0.05 (n/a)0.03 (n/a)0.01 (n/a)288.90 (n/a)196.30 (n/a)178.30 (n/a)153.60 (n/a)53.08 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.05 (+4.92%)0.04 (-2.29%)0.05 (-5.50%)0.03 (-5.05%)0.01 (+36.96%)238.40 (+5.30%)191.74 (+3.89%)180.10 (+5.82%)154.10 (-4.64%)38.63 (+38.94%)
573678d — 2026-06-29 18:46:460.05 (n/a)0.05 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)226.40 (n/a)184.56 (n/a)170.20 (n/a)161.60 (n/a)27.81 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.06 (+10.32%)0.05 (+0.97%)0.04 (-3.27%)0.04 (-0.12%)0.01 (+44.31%)224.80 (+0.13%)195.80 (-0.08%)203.80 (+3.40%)150.50 (-9.34%)28.26 (+28.36%)
573678d — 2026-06-29 18:46:460.05 (n/a)0.04 (n/a)0.04 (n/a)0.04 (n/a)0.01 (n/a)224.50 (n/a)195.96 (n/a)197.10 (n/a)166.00 (n/a)22.01 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.05 (-0.23%)0.05 (+2.05%)0.05 (+7.97%)0.04 (-5.49%)0.01 (+24.58%)220.40 (+5.81%)184.50 (-1.28%)167.80 (-7.40%)156.70 (+0.26%)28.77 (+31.45%)
573678d — 2026-06-29 18:46:460.05 (n/a)0.04 (n/a)0.05 (n/a)0.04 (n/a)0.01 (n/a)208.30 (n/a)186.90 (n/a)181.20 (n/a)156.30 (n/a)21.89 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_8-num_channels_1-tile_size_256-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.06 (-12.71%)0.04 (-3.64%)0.04 (-2.96%)0.04 (-0.14%)0.01 (-25.19%)223.10 (+0.18%)197.38 (+2.81%)211.40 (+3.02%)156.90 (+14.61%)29.83 (-10.22%)
573678d — 2026-06-29 18:46:460.06 (n/a)0.05 (n/a)0.04 (n/a)0.04 (n/a)0.01 (n/a)222.70 (n/a)191.98 (n/a)205.20 (n/a)136.90 (n/a)33.23 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_8-num_channels_2-tile_size_128-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.04 (-14.86%)0.03 (+6.14%)0.04 (+24.01%)0.02 (+13.43%)0.01 (-32.57%)350.40 (-11.85%)271.34 (-9.70%)231.00 (-19.37%)221.10 (+17.48%)60.66 (-33.42%)
573678d — 2026-06-29 18:46:460.04 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)397.50 (n/a)300.50 (n/a)286.50 (n/a)188.20 (n/a)91.11 (n/a)
iron/operators/rope

test_rope[rows_32-cols_512-angle_rows_32-aie_columns_1-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.85 (+7.81%)0.71 (+23.21%)0.66 (+13.96%)0.63 (+43.84%)0.09 (-35.18%)156.40 (-30.49%)140.92 (-21.31%)149.00 (-12.25%)115.70 (-7.22%)16.35 (-58.75%)
573678d — 2026-06-29 18:46:460.79 (n/a)0.57 (n/a)0.58 (n/a)0.44 (n/a)0.14 (n/a)225.00 (n/a)179.08 (n/a)169.80 (n/a)124.70 (n/a)39.63 (n/a)

test_rope[rows_32-cols_512-angle_rows_32-aie_columns_2-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.74 (-10.24%)0.64 (-4.61%)0.66 (-0.47%)0.57 (-1.24%)0.07 (-23.43%)171.30 (+1.24%)154.50 (+4.42%)149.30 (+0.47%)133.30 (+11.45%)16.46 (-9.92%)
573678d — 2026-06-29 18:46:460.82 (n/a)0.67 (n/a)0.66 (n/a)0.58 (n/a)0.09 (n/a)169.20 (n/a)147.96 (n/a)148.60 (n/a)119.60 (n/a)18.27 (n/a)

test_rope[rows_32-cols_512-angle_rows_32-aie_columns_4-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.78 (+32.78%)0.56 (+11.74%)0.48 (-4.63%)0.45 (+14.21%)0.15 (+99.98%)218.90 (-12.44%)183.64 (-7.78%)206.90 (+4.87%)126.10 (-24.67%)41.90 (+32.77%)
573678d — 2026-06-29 18:46:460.59 (n/a)0.50 (n/a)0.50 (n/a)0.39 (n/a)0.07 (n/a)250.00 (n/a)199.14 (n/a)197.30 (n/a)167.40 (n/a)31.56 (n/a)

test_rope[rows_32-cols_512-angle_rows_32-aie_columns_8-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.55 (-8.87%)0.49 (-5.32%)0.46 (-8.89%)0.44 (+1.88%)0.05 (-21.17%)222.20 (-1.81%)203.44 (+5.22%)214.30 (+9.78%)177.80 (+9.75%)20.19 (-14.73%)
573678d — 2026-06-29 18:46:460.61 (n/a)0.51 (n/a)0.50 (n/a)0.43 (n/a)0.06 (n/a)226.30 (n/a)193.34 (n/a)195.20 (n/a)162.00 (n/a)23.68 (n/a)

test_rope[rows_32-cols_512-angle_rows_8-aie_columns_1-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.55 (-17.43%)0.44 (-3.97%)0.41 (-10.45%)0.38 (+28.71%)0.07 (-51.92%)193.80 (-22.29%)172.10 (-1.69%)178.50 (+11.70%)133.10 (+21.11%)22.95 (-56.18%)
573678d — 2026-06-29 18:46:460.67 (n/a)0.45 (n/a)0.46 (n/a)0.30 (n/a)0.14 (n/a)249.40 (n/a)175.06 (n/a)159.80 (n/a)109.90 (n/a)52.38 (n/a)

test_rope[rows_32-cols_512-angle_rows_8-aie_columns_2-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.70 (+34.77%)0.53 (+34.56%)0.49 (+33.95%)0.43 (+54.25%)0.11 (+1.76%)172.80 (-35.18%)142.64 (-27.70%)149.50 (-25.36%)105.00 (-25.80%)25.19 (-51.15%)
573678d — 2026-06-29 18:46:460.52 (n/a)0.39 (n/a)0.37 (n/a)0.28 (n/a)0.10 (n/a)266.60 (n/a)197.30 (n/a)200.30 (n/a)141.50 (n/a)51.56 (n/a)

test_rope[rows_32-cols_512-angle_rows_8-aie_columns_4-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.47 (-4.83%)0.39 (-6.28%)0.43 (+2.81%)0.25 (-24.36%)0.09 (+37.66%)291.90 (+32.20%)197.16 (+10.02%)170.10 (-2.74%)158.20 (+5.05%)54.96 (+96.58%)
573678d — 2026-06-29 18:46:460.49 (n/a)0.42 (n/a)0.42 (n/a)0.33 (n/a)0.06 (n/a)220.80 (n/a)179.20 (n/a)174.90 (n/a)150.60 (n/a)27.96 (n/a)

test_rope[rows_32-cols_512-angle_rows_8-aie_columns_8-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.68 (+11.34%)0.37 (-15.33%)0.30 (-29.29%)0.23 (-29.93%)0.18 (+67.22%)325.10 (+42.71%)230.50 (+30.54%)244.00 (+41.45%)107.90 (-10.23%)82.50 (+107.88%)
573678d — 2026-06-29 18:46:460.61 (n/a)0.44 (n/a)0.43 (n/a)0.32 (n/a)0.11 (n/a)227.80 (n/a)176.58 (n/a)172.50 (n/a)120.20 (n/a)39.69 (n/a)
iron/operators/softmax

test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.74 (-16.47%)0.69 (-3.05%)0.70 (+3.50%)0.63 (+0.95%)0.04 (-58.84%)209.60 (-0.95%)189.90 (+1.87%)187.90 (-3.39%)177.00 (+19.68%)12.35 (-50.08%)
573678d — 2026-06-29 18:46:460.89 (n/a)0.71 (n/a)0.67 (n/a)0.62 (n/a)0.11 (n/a)211.60 (n/a)186.42 (n/a)194.50 (n/a)147.90 (n/a)24.75 (n/a)

test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.75 (-21.53%)0.69 (-2.21%)0.68 (+2.56%)0.65 (+13.31%)0.04 (-72.22%)202.10 (-11.75%)190.16 (-0.37%)192.80 (-2.48%)175.60 (+27.43%)10.83 (-67.59%)
573678d — 2026-06-29 18:46:460.95 (n/a)0.71 (n/a)0.66 (n/a)0.57 (n/a)0.14 (n/a)229.00 (n/a)190.86 (n/a)197.70 (n/a)137.80 (n/a)33.43 (n/a)

test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.90 (+13.89%)0.78 (+13.27%)0.76 (+8.15%)0.67 (+24.64%)0.09 (-5.25%)195.10 (-19.78%)169.44 (-12.26%)172.30 (-7.51%)145.20 (-12.16%)18.95 (-35.74%)
573678d — 2026-06-29 18:46:460.79 (n/a)0.69 (n/a)0.70 (n/a)0.54 (n/a)0.09 (n/a)243.20 (n/a)193.12 (n/a)186.30 (n/a)165.30 (n/a)29.49 (n/a)
iron/operators/swiglu_decode

test_swiglu_decode[embedding_dim_1024-hidden_dim_3584]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.00 (+9.09%)0.00 (+10.00%)0.00 (+10.00%)0.00 (+0.00%)0.00 (+22.47%)4413.60 (-4.40%)3815.22 (-8.23%)3721.97 (-10.73%)3517.39 (-6.60%)369.53 (+1.27%)
573678d — 2026-06-29 18:46:460.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)4616.74 (n/a)4157.53 (n/a)4169.13 (n/a)3765.84 (n/a)364.88 (n/a)

test_swiglu_decode[embedding_dim_2048-hidden_dim_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.00 (+0.00%)0.00 (+12.63%)0.00 (+22.22%)0.00 (+20.00%)0.00 (-38.85%)4579.79 (-17.49%)3887.89 (-12.28%)3777.13 (-16.80%)3574.67 (+0.20%)403.33 (-48.66%)
573678d — 2026-06-29 18:46:460.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)5550.60 (n/a)4432.38 (n/a)4539.79 (n/a)3567.47 (n/a)785.64 (n/a)
iron/operators/swiglu_prefill

test_swiglu_prefill[seq_len_256-embedding_dim_2048-hidden_dim_2048-prio_accuracy_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:450.28 (+0.07%)0.24 (-2.37%)0.28 (+0.55%)0.15 (-12.18%)0.06 (+20.68%)14192.93 (+13.89%)9136.39 (+4.77%)7621.66 (-0.52%)7545.25 (-0.10%)2874.51 (+36.10%)
573678d — 2026-06-29 18:46:460.28 (n/a)0.25 (n/a)0.27 (n/a)0.17 (n/a)0.05 (n/a)12461.92 (n/a)8720.16 (n/a)7661.84 (n/a)7552.69 (n/a)2112.03 (n/a)
iron/operators/swiglu_prefill_stream

test_swiglu_prefill_stream[seq_len_256-embedding_dim_512-hidden_dim_2048-seq_tile_32-embedding_tile_32-hidden_tile_64]

No metrics available.

test_swiglu_prefill_stream_k2[seq_len_256-embedding_dim_512-hidden_dim_2048-seq_tile_32-embedding_tile_32-hidden_tile_64]

No metrics available.

iron/operators/transpose

test_transpose[M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8-num_batches_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:453.37 (+14.98%)2.84 (+17.18%)2.82 (+15.00%)2.18 (+17.48%)0.44 (+7.99%)241.00 (-14.90%)188.68 (-14.92%)185.80 (-13.06%)155.60 (-13.02%)32.06 (-19.29%)
573678d — 2026-06-29 18:46:462.93 (n/a)2.42 (n/a)2.45 (n/a)1.85 (n/a)0.41 (n/a)283.20 (n/a)221.78 (n/a)213.70 (n/a)178.90 (n/a)39.73 (n/a)

test_transpose[M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8-num_batches_2]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:455.47 (-6.81%)4.63 (+2.39%)4.51 (-3.01%)3.86 (+12.42%)0.61 (-40.39%)271.80 (-11.06%)229.66 (-5.00%)232.60 (+3.10%)191.70 (+7.27%)30.18 (-44.92%)
573678d — 2026-06-29 18:46:465.87 (n/a)4.52 (n/a)4.65 (n/a)3.43 (n/a)1.02 (n/a)305.60 (n/a)241.74 (n/a)225.60 (n/a)178.70 (n/a)54.79 (n/a)

test_transpose[M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
4bb8427 — 2026-06-23 23:08:223.72 (+20.32%)3.12 (+16.02%)3.04 (+6.32%)2.73 (+32.06%)0.43 (+8.15%)192.20 (-24.30%)170.32 (-14.26%)172.50 (-5.94%)141.00 (-16.86%)22.57 (-32.31%)
4d4b803 — 2026-06-22 17:54:573.09 (n/a)2.69 (n/a)2.86 (n/a)2.07 (n/a)0.40 (n/a)253.90 (n/a)198.64 (n/a)183.40 (n/a)169.60 (n/a)33.34 (n/a)

test_transpose[M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8-num_batches_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:52:453.45 (+8.97%)2.72 (-0.05%)2.54 (-7.62%)2.33 (+7.61%)0.46 (+1.82%)225.20 (-7.06%)197.04 (-0.16%)206.80 (+8.27%)152.00 (-8.21%)30.30 (-11.18%)
573678d — 2026-06-29 18:46:463.17 (n/a)2.72 (n/a)2.74 (n/a)2.16 (n/a)0.46 (n/a)242.30 (n/a)197.36 (n/a)191.00 (n/a)165.60 (n/a)34.11 (n/a)

test_transpose[M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
4bb8427 — 2026-06-23 23:08:224.36 (+7.95%)3.26 (+13.04%)3.33 (+22.37%)2.38 (+18.97%)0.79 (-14.92%)220.70 (-15.96%)168.40 (-14.68%)157.40 (-18.28%)120.30 (-7.39%)40.78 (-34.38%)
4d4b803 — 2026-06-22 17:54:574.04 (n/a)2.89 (n/a)2.72 (n/a)2.00 (n/a)0.93 (n/a)262.60 (n/a)197.38 (n/a)192.60 (n/a)129.90 (n/a)62.14 (n/a)
Krackan - Examples

IRON

Tested on 2026_06_30_17_07_53 at commit 21c9a18.

iron/applications/llama_3.2_1b
TestChecksTTFT (mean)TPS (mean)
test_llama_3_2_1b[llama_3.2_1b_prompt_1024_tokens_1]✅ 5/52.13n/a
test_llama_3_2_1b[llama_3.2_1b_prompt_1024_tokens_40]✅ 5/52.164.17
test_llama_3_2_1b[llama_3.2_1b_prompt_13_tokens_1]✅ 5/52.09n/a
test_llama_3_2_1b[llama_3.2_1b_prompt_13_tokens_40]✅ 5/52.094.16

Trends:

IRON Trends

iron/applications/llama_3.2_1b

test_llama_3_2_1b[llama_3.2_1b_prompt_1024_tokens_1]

Commit/Date TTFT (max)TTFT (mean)TTFT (median)TTFT (min)TTFT (stddev)
21c9a18 — 2026-06-30 17:02:042.14 (+0.28%)2.13 (+0.80%)2.14 (+1.23%)2.12 (+0.76%)0.01 (-28.81%)
573678d — 2026-06-29 18:14:412.14 (n/a)2.12 (n/a)2.11 (n/a)2.10 (n/a)0.01 (n/a)

test_llama_3_2_1b[llama_3.2_1b_prompt_1024_tokens_40]

Commit/Date TPS (max)TPS (mean)TPS (median)TPS (min)TPS (stddev)TTFT (max)TTFT (mean)TTFT (median)TTFT (min)TTFT (stddev)
21c9a18 — 2026-06-30 17:02:044.19 (-0.07%)4.17 (-0.13%)4.17 (-0.24%)4.15 (-0.14%)0.02 (+3.24%)2.27 (-0.48%)2.16 (+0.17%)2.13 (+0.23%)2.13 (+0.71%)0.06 (-13.22%)
573678d — 2026-06-29 18:14:414.20 (n/a)4.17 (n/a)4.17 (n/a)4.16 (n/a)0.02 (n/a)2.28 (n/a)2.16 (n/a)2.13 (n/a)2.12 (n/a)0.07 (n/a)

test_llama_3_2_1b[llama_3.2_1b_prompt_13_tokens_1]

Commit/Date TTFT (max)TTFT (mean)TTFT (median)TTFT (min)TTFT (stddev)
21c9a18 — 2026-06-30 17:02:042.10 (-0.19%)2.09 (+0.50%)2.09 (+0.10%)2.09 (+1.75%)0.00 (-78.58%)
573678d — 2026-06-29 18:14:412.10 (n/a)2.08 (n/a)2.09 (n/a)2.05 (n/a)0.02 (n/a)

test_llama_3_2_1b[llama_3.2_1b_prompt_13_tokens_40]

Commit/Date TPS (max)TPS (mean)TPS (median)TPS (min)TPS (stddev)TTFT (max)TTFT (mean)TTFT (median)TTFT (min)TTFT (stddev)
21c9a18 — 2026-06-30 17:02:044.17 (+0.00%)4.16 (-0.10%)4.16 (-0.17%)4.14 (-0.22%)0.01 (+36.26%)2.11 (+0.96%)2.09 (+0.75%)2.08 (+0.82%)2.07 (+1.07%)0.02 (-8.17%)
573678d — 2026-06-29 18:14:414.17 (n/a)4.16 (n/a)4.16 (n/a)4.15 (n/a)0.01 (n/a)2.09 (n/a)2.07 (n/a)2.07 (n/a)2.05 (n/a)0.02 (n/a)
Phoenix - Small

IRON

Tested on 2026_06_30_17_01_11 at commit 21c9a18.

iron/operators/axpy
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_axpy[input_length_2048-num_aie_columns_1-tile_size_2048-scalar_factor_3.0]✅ 5/5381.780.03n/a
test_axpy[input_length_2048-num_aie_columns_2-tile_size_1024-scalar_factor_3.0]✅ 5/5444.240.03n/a
test_axpy[input_length_2048-num_aie_columns_4-tile_size_512-scalar_factor_3.0]✅ 5/5368.600.04n/a
iron/operators/dequant
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_dequant[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-group_size_32]✅ 5/5356.380.02n/a
test_dequant[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-group_size_32]✅ 5/5405.200.01n/a
test_dequant[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-group_size_32]✅ 5/5811.080.01n/a
test_dequant[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-group_size_32]✅ 5/5368.100.02n/a
test_dequant[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-group_size_32]✅ 5/5455.920.01n/a
test_dequant[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-group_size_32]✅ 5/5678.180.01n/a
iron/operators/elementwise_add
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_elementwise_add[input_length_2048-num_aie_columns_1-tile_size_2048]✅ 5/5386.060.04n/a
test_elementwise_add[input_length_2048-num_aie_columns_2-tile_size_1024]✅ 5/5429.640.03n/a
test_elementwise_add[input_length_2048-num_aie_columns_4-tile_size_512]✅ 5/5535.100.02n/a
iron/operators/elementwise_mul
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_elementwise_mul[input_length_2048-num_aie_columns_1-tile_size_2048]✅ 5/5319.340.04n/a
test_elementwise_mul[input_length_2048-num_aie_columns_2-tile_size_1024]✅ 5/5457.840.03n/a
test_elementwise_mul[input_length_2048-num_aie_columns_4-tile_size_512]✅ 5/5399.800.03n/a
iron/operators/gelu
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_gelu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5368.380.03n/a
test_gelu[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]✅ 5/5398.120.02n/a
test_gelu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5391.660.02n/a
test_gelu[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5649.900.02n/a
test_gelu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5519.740.02n/a
test_gelu[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]✅ 5/5538.640.02n/a
iron/operators/gemm
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_gemm[M_192-K_384-N_64-num_aie_columns_4-b_col_maj_False-c_col_maj_False-m_48-k_96-n_16-trace_size_0-partition_N_1]✅ 5/5548.580.4217.84
test_gemm[M_192-K_384-N_64-num_aie_columns_4-b_col_maj_True-c_col_maj_True-m_48-k_96-n_16-trace_size_0-partition_N_1]✅ 5/5550.340.4619.64
test_gemm[M_2048-K_2048-N_2048-num_aie_columns_1-b_col_maj_False-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1]✅ 5/582800.260.30207.52
test_gemm[M_2048-K_2048-N_2048-num_aie_columns_2-b_col_maj_True-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1]✅ 5/525278.321.00679.72
test_gemm[M_384-K_1536-N_1792-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_32-k_48-n_64-trace_size_0-partition_N_1]✅ 5/53482.642.48650.64
test_gemm[M_64-K_512-N_256-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_16-k_64-n_64-trace_size_0-partition_N_4]✅ 5/56435.880.2111.43
iron/operators/gemv
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_gemv[M_128-K_128-num_aie_columns_1-tile_size_input_32-tile_size_output_128]✅ 5/5n/a0.100.10
test_gemv[M_2048-K_8192-num_aie_columns_1-tile_size_input_1-tile_size_output_2048]✅ 5/5n/a3.653.65
test_gemv[M_2048-K_8192-num_aie_columns_2-tile_size_input_1-tile_size_output_1024]✅ 5/5n/a6.056.05
test_gemv[M_2048-K_8192-num_aie_columns_4-tile_size_input_1-tile_size_output_512]✅ 5/5n/a10.5510.55
test_gemv[M_8192-K_2048-num_aie_columns_1-tile_size_input_4-tile_size_output_1024]✅ 5/5n/a3.693.69
test_gemv[M_8192-K_2048-num_aie_columns_2-tile_size_input_4-tile_size_output_1024]✅ 5/5n/a6.876.87
test_gemv[M_8192-K_2048-num_aie_columns_4-tile_size_input_4-tile_size_output_1024]✅ 5/5n/a8.688.67
iron/operators/layer_norm
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_layer_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5434.440.02n/a
test_layer_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]✅ 5/5752.660.02n/a
test_layer_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5452.040.02n/a
test_layer_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5372.200.03n/a
test_layer_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5377.320.02n/a
test_layer_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]✅ 5/5411.000.02n/a
iron/operators/mem_copy
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_mem_copy[input_length_2048-num_cores_1-num_channels_1-bypass_False-tile_size_2048]✅ 5/5353.780.03n/a
test_mem_copy[input_length_2048-num_cores_2-num_channels_1-bypass_False-tile_size_1024]✅ 5/5665.140.02n/a
test_mem_copy[input_length_2048-num_cores_2-num_channels_2-bypass_False-tile_size_1024]✅ 5/5442.380.02n/a
test_mem_copy[input_length_2048-num_cores_4-num_channels_1-bypass_False-tile_size_512]✅ 5/5565.540.02n/a
test_mem_copy[input_length_2048-num_cores_4-num_channels_2-bypass_False-tile_size_512]✅ 5/5351.260.02n/a
test_mem_copy[input_length_2048-num_cores_8-num_channels_2-bypass_False-tile_size_256]✅ 5/5466.340.02n/a
iron/operators/relu
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_relu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5335.240.03n/a
test_relu[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]✅ 5/5334.360.03n/a
test_relu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5841.760.02n/a
test_relu[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5373.840.03n/a
test_relu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5326.880.03n/a
test_relu[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]✅ 5/5457.820.02n/a
iron/operators/rms_norm
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_False]✅ 5/5322.240.03n/a
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_True]✅ 5/5365.180.04n/a
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_False]✅ 5/5368.480.03n/a
test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_True]✅ 5/5445.820.02n/a
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_False]✅ 5/5366.300.02n/a
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_True]✅ 5/5356.820.03n/a
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_False]✅ 5/5443.980.02n/a
test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_True]✅ 5/5394.840.03n/a
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_False]✅ 5/5430.580.02n/a
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_True]✅ 5/5410.860.03n/a
test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_False]✅ 5/5461.220.02n/a
iron/operators/rope
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_1-method_type_0]✅ 5/5434.440.24n/a
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_2-method_type_0]✅ 5/5344.800.30n/a
test_rope[rows_32-cols_512-angle_rows_32-aie_columns_4-method_type_0]✅ 5/5494.560.23n/a
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_1-method_type_0]✅ 5/5398.620.21n/a
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_2-method_type_0]✅ 5/5367.500.22n/a
test_rope[rows_32-cols_512-angle_rows_8-aie_columns_4-method_type_0]✅ 5/5447.440.17n/a
iron/operators/sigmoid
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_sigmoid[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5389.740.02n/a
test_sigmoid[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]✅ 5/5330.480.03n/a
test_sigmoid[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5740.860.02n/a
test_sigmoid[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5562.480.01n/a
test_sigmoid[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5450.740.02n/a
test_sigmoid[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]✅ 5/5429.160.02n/a
iron/operators/silu
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_silu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5523.640.02n/a
test_silu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5309.460.03n/a
test_silu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5468.460.02n/a
iron/operators/softmax
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_1024]✅ 5/5374.920.37n/a
test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_2048]✅ 5/5401.160.38n/a
test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5831.320.21n/a
iron/operators/swiglu_decode
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_swiglu_decode[embedding_dim_1024-hidden_dim_3584]✅ 5/512094.960.00n/a
test_swiglu_decode[embedding_dim_2048-hidden_dim_2048]✅ 5/511397.110.00n/a
iron/operators/swiglu_prefill
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_swiglu_prefill[seq_len_256-embedding_dim_2048-hidden_dim_2048-prio_accuracy_False]✅ 5/521335.950.10n/a
iron/operators/tanh
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_tanh[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]✅ 5/5309.880.03n/a
test_tanh[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]✅ 5/5555.900.02n/a
test_tanh[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]✅ 5/5452.920.02n/a
test_tanh[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]✅ 5/5397.060.02n/a
test_tanh[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]✅ 5/5342.820.03n/a
test_tanh[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]✅ 5/5481.900.02n/a
iron/operators/transpose
TestChecksLatency (mean)Bandwidth (mean)Throughput (mean)
test_transpose[M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8-num_batches_1]✅ 5/51004.281.11n/a
test_transpose[M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8-num_batches_2]✅ 5/51455.521.34n/a
test_transpose[M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8-num_batches_1]✅ 5/5448.521.34n/a

Trends:

IRON Trends

iron/operators/axpy

test_axpy[input_length_2048-num_aie_columns_1-tile_size_2048-scalar_factor_3.0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.04 (-24.68%)0.03 (-7.35%)0.04 (+4.86%)0.02 (+5.03%)0.01 (-29.37%)543.70 (-4.80%)381.78 (+3.30%)298.50 (-4.63%)277.40 (+32.73%)127.72 (-12.99%)
573678d — 2026-06-29 18:24:150.06 (n/a)0.04 (n/a)0.04 (n/a)0.02 (n/a)0.01 (n/a)571.10 (n/a)369.60 (n/a)313.00 (n/a)209.00 (n/a)146.79 (n/a)

test_axpy[input_length_2048-num_aie_columns_2-tile_size_1024-scalar_factor_3.0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.05 (-15.48%)0.03 (-6.51%)0.03 (+15.47%)0.02 (-9.07%)0.01 (-19.08%)606.10 (+9.96%)444.24 (+4.75%)451.80 (-13.40%)264.10 (+18.32%)158.39 (+3.60%)
573678d — 2026-06-29 18:24:150.06 (n/a)0.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)551.20 (n/a)424.08 (n/a)521.70 (n/a)223.20 (n/a)152.88 (n/a)

test_axpy[input_length_2048-num_aie_columns_4-tile_size_512-scalar_factor_3.0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.06 (+25.13%)0.04 (+2.45%)0.04 (-21.84%)0.02 (-0.60%)0.02 (+25.20%)547.00 (+0.61%)368.60 (+1.34%)323.70 (+27.94%)190.60 (-20.05%)168.85 (+6.19%)
573678d — 2026-06-29 18:24:150.05 (n/a)0.04 (n/a)0.05 (n/a)0.02 (n/a)0.01 (n/a)543.70 (n/a)363.74 (n/a)253.00 (n/a)238.40 (n/a)159.01 (n/a)
iron/operators/dequant

test_dequant[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.02 (+1.66%)0.02 (-2.10%)0.02 (+19.95%)0.01 (-10.54%)0.01 (+34.10%)554.10 (+11.78%)356.38 (+8.59%)255.20 (-16.63%)242.50 (-1.62%)149.10 (+45.74%)
573678d — 2026-06-29 18:24:150.02 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.00 (n/a)495.70 (n/a)328.18 (n/a)306.10 (n/a)246.50 (n/a)102.30 (n/a)

test_dequant[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.02 (-5.70%)0.01 (-3.22%)0.01 (+6.46%)0.01 (+27.70%)0.00 (-40.52%)498.70 (-21.70%)405.20 (-5.27%)391.10 (-6.08%)279.50 (+6.03%)86.73 (-48.11%)
573678d — 2026-06-29 18:24:150.02 (n/a)0.01 (n/a)0.01 (n/a)0.01 (n/a)0.01 (n/a)636.90 (n/a)427.76 (n/a)416.40 (n/a)263.60 (n/a)167.14 (n/a)

test_dequant[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.02 (-4.45%)0.01 (-38.04%)0.01 (-40.52%)0.00 (-82.64%)0.01 (+122.35%)2131.20 (+476.16%)811.08 (+149.46%)569.10 (+68.12%)263.40 (+4.65%)750.00 (+1371.83%)
573678d — 2026-06-29 18:24:150.02 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.00 (n/a)369.90 (n/a)325.14 (n/a)338.50 (n/a)251.70 (n/a)50.96 (n/a)

test_dequant[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.02 (+12.96%)0.02 (+14.47%)0.02 (+13.34%)0.01 (-2.20%)0.01 (+23.76%)556.60 (+2.26%)368.10 (-10.55%)332.90 (-11.77%)232.70 (-11.45%)125.51 (+12.77%)
573678d — 2026-06-29 18:24:150.02 (n/a)0.01 (n/a)0.01 (n/a)0.01 (n/a)0.00 (n/a)544.30 (n/a)411.50 (n/a)377.30 (n/a)262.80 (n/a)111.30 (n/a)

test_dequant[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.02 (-18.40%)0.01 (-16.71%)0.01 (-10.91%)0.01 (-27.71%)0.00 (-15.90%)617.00 (+38.34%)455.92 (+21.20%)448.80 (+12.26%)289.80 (+22.54%)117.36 (+45.51%)
573678d — 2026-06-29 18:24:150.02 (n/a)0.01 (n/a)0.01 (n/a)0.01 (n/a)0.00 (n/a)446.00 (n/a)376.18 (n/a)399.80 (n/a)236.50 (n/a)80.66 (n/a)

test_dequant[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-group_size_32]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.02 (-2.86%)0.01 (-31.56%)0.01 (-36.85%)0.00 (-64.52%)0.01 (+29.34%)1351.80 (+181.86%)678.18 (+82.43%)658.00 (+58.36%)237.90 (+2.94%)414.46 (+263.39%)
573678d — 2026-06-29 18:24:150.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)0.01 (n/a)479.60 (n/a)371.74 (n/a)415.50 (n/a)231.10 (n/a)114.05 (n/a)
iron/operators/elementwise_add

test_elementwise_add[input_length_2048-num_aie_columns_1-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.05 (n/a)0.04 (n/a)0.04 (n/a)0.02 (n/a)0.01 (n/a)564.00 (n/a)386.06 (n/a)292.00 (n/a)267.10 (n/a)141.70 (n/a)

test_elementwise_add[input_length_2048-num_aie_columns_2-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.06 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.02 (n/a)665.60 (n/a)429.64 (n/a)446.50 (n/a)211.10 (n/a)180.68 (n/a)

test_elementwise_add[input_length_2048-num_aie_columns_4-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.03 (n/a)0.02 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)746.80 (n/a)535.10 (n/a)534.50 (n/a)381.60 (n/a)139.29 (n/a)
iron/operators/elementwise_mul

test_elementwise_mul[input_length_2048-num_aie_columns_1-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.06 (n/a)0.04 (n/a)0.04 (n/a)0.02 (n/a)0.02 (n/a)560.60 (n/a)319.34 (n/a)286.00 (n/a)190.10 (n/a)141.24 (n/a)

test_elementwise_mul[input_length_2048-num_aie_columns_2-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.04 (n/a)0.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)597.50 (n/a)457.84 (n/a)521.70 (n/a)284.90 (n/a)156.65 (n/a)

test_elementwise_mul[input_length_2048-num_aie_columns_4-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.05 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)540.90 (n/a)399.80 (n/a)430.40 (n/a)240.00 (n/a)134.01 (n/a)
iron/operators/gelu

test_gelu[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.05 (n/a)0.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)534.10 (n/a)368.38 (n/a)341.70 (n/a)168.00 (n/a)154.51 (n/a)

test_gelu[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.03 (n/a)0.02 (n/a)0.03 (n/a)0.01 (n/a)0.01 (n/a)619.40 (n/a)398.12 (n/a)287.40 (n/a)262.40 (n/a)169.18 (n/a)

test_gelu[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.03 (n/a)0.02 (n/a)0.03 (n/a)0.01 (n/a)0.01 (n/a)576.50 (n/a)391.66 (n/a)318.80 (n/a)258.90 (n/a)135.32 (n/a)

test_gelu[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.03 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)0.01 (n/a)1353.90 (n/a)649.90 (n/a)546.90 (n/a)294.50 (n/a)407.67 (n/a)

test_gelu[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)1060.30 (n/a)519.74 (n/a)352.20 (n/a)288.30 (n/a)319.53 (n/a)

test_gelu[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)0.01 (n/a)677.80 (n/a)538.64 (n/a)603.30 (n/a)331.30 (n/a)137.92 (n/a)
iron/operators/gemm

test_gemm[M_192-K_384-N_64-num_aie_columns_4-b_col_maj_False-c_col_maj_False-m_48-k_96-n_16-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
21c9a18 — 2026-06-30 16:58:160.56 (-9.28%)0.42 (+0.59%)0.38 (-4.61%)0.33 (+84.23%)0.09 (-43.09%)676.00 (-45.72%)548.58 (-13.73%)582.20 (+4.83%)395.10 (+10.24%)111.01 (-68.51%)23.89 (-9.28%)17.84 (+0.59%)16.21 (-4.61%)13.96 (+84.23%)3.97 (-43.09%)
573678d — 2026-06-29 18:24:150.62 (n/a)0.42 (n/a)0.40 (n/a)0.18 (n/a)0.16 (n/a)1245.30 (n/a)635.90 (n/a)555.40 (n/a)358.40 (n/a)352.52 (n/a)26.33 (n/a)17.73 (n/a)16.99 (n/a)7.58 (n/a)6.98 (n/a)

test_gemm[M_192-K_384-N_64-num_aie_columns_4-b_col_maj_True-c_col_maj_True-m_48-k_96-n_16-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
21c9a18 — 2026-06-30 16:58:160.63 (+11.86%)0.46 (+15.76%)0.46 (+30.32%)0.23 (+16.34%)0.17 (+8.72%)973.40 (-14.04%)550.34 (-14.39%)485.90 (-23.26%)353.20 (-10.60%)253.87 (-14.88%)26.72 (+11.86%)19.64 (+15.76%)19.42 (+30.32%)9.70 (+16.34%)7.07 (+8.72%)
573678d — 2026-06-29 18:24:150.56 (n/a)0.40 (n/a)0.35 (n/a)0.20 (n/a)0.15 (n/a)1132.40 (n/a)642.84 (n/a)633.20 (n/a)395.10 (n/a)298.25 (n/a)23.88 (n/a)16.97 (n/a)14.90 (n/a)8.33 (n/a)6.51 (n/a)

test_gemm[M_2048-K_2048-N_2048-num_aie_columns_1-b_col_maj_False-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
21c9a18 — 2026-06-30 16:58:160.31 (+1.14%)0.30 (+2.00%)0.30 (+2.92%)0.30 (+1.74%)0.00 (-22.94%)84441.80 (-1.71%)82800.26 (-1.97%)82742.30 (-2.83%)81268.90 (-1.13%)1258.20 (-25.24%)211.40 (+1.14%)207.52 (+2.00%)207.63 (+2.92%)203.45 (+1.74%)3.15 (-22.94%)
573678d — 2026-06-29 18:24:150.31 (n/a)0.30 (n/a)0.30 (n/a)0.29 (n/a)0.01 (n/a)85909.50 (n/a)84466.00 (n/a)85155.20 (n/a)82194.40 (n/a)1682.98 (n/a)209.02 (n/a)203.46 (n/a)201.75 (n/a)199.98 (n/a)4.09 (n/a)

test_gemm[M_2048-K_2048-N_2048-num_aie_columns_2-b_col_maj_True-c_col_maj_False-m_64-k_64-n_64-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
21c9a18 — 2026-06-30 16:58:161.01 (-3.11%)1.00 (-1.88%)1.00 (-3.58%)0.97 (+3.78%)0.01 (-69.57%)25821.40 (-3.65%)25278.32 (+1.78%)25257.20 (+3.72%)25006.50 (+3.21%)332.52 (-69.84%)687.02 (-3.11%)679.72 (-1.88%)680.20 (-3.58%)665.34 (+3.78%)8.85 (-69.57%)
573678d — 2026-06-29 18:24:151.04 (n/a)1.01 (n/a)1.03 (n/a)0.94 (n/a)0.04 (n/a)26798.50 (n/a)24836.36 (n/a)24352.00 (n/a)24229.80 (n/a)1102.58 (n/a)709.04 (n/a)692.75 (n/a)705.48 (n/a)641.08 (n/a)29.07 (n/a)

test_gemm[M_384-K_1536-N_1792-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_32-k_48-n_64-trace_size_0-partition_N_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
21c9a18 — 2026-06-30 16:58:163.42 (-13.59%)2.48 (-19.14%)2.44 (-28.42%)1.57 (-10.07%)0.70 (-17.93%)5136.60 (+11.20%)3482.64 (+22.06%)3299.10 (+39.70%)2360.30 (+15.73%)1066.53 (+2.69%)895.62 (-13.59%)650.64 (-19.14%)640.75 (-28.42%)411.54 (-10.07%)183.85 (-17.93%)
573678d — 2026-06-29 18:24:153.95 (n/a)3.07 (n/a)3.41 (n/a)1.75 (n/a)0.85 (n/a)4619.20 (n/a)2853.28 (n/a)2361.60 (n/a)2039.50 (n/a)1038.58 (n/a)1036.52 (n/a)804.63 (n/a)895.11 (n/a)457.64 (n/a)224.01 (n/a)

test_gemm[M_64-K_512-N_256-num_aie_columns_4-b_col_maj_True-c_col_maj_False-m_16-k_64-n_64-trace_size_0-partition_N_4]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
21c9a18 — 2026-06-30 16:58:160.34 (+7.05%)0.21 (-16.32%)0.20 (-30.84%)0.15 (-18.76%)0.08 (+24.35%)8537.60 (+23.09%)6435.88 (+24.08%)6242.90 (+44.59%)3679.60 (-6.58%)1993.91 (+42.72%)18.24 (+7.05%)11.43 (-16.32%)10.75 (-30.84%)7.86 (-18.76%)4.20 (+24.35%)
573678d — 2026-06-29 18:24:150.32 (n/a)0.25 (n/a)0.29 (n/a)0.18 (n/a)0.06 (n/a)6936.30 (n/a)5186.80 (n/a)4317.60 (n/a)3938.80 (n/a)1397.04 (n/a)17.04 (n/a)13.66 (n/a)15.54 (n/a)9.68 (n/a)3.37 (n/a)
iron/operators/gemv

test_gemv[M_128-K_128-num_aie_columns_1-tile_size_input_32-tile_size_output_128]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
21c9a18 — 2026-06-30 16:58:160.13 (-4.24%)0.10 (-4.37%)0.11 (-3.50%)0.06 (-13.80%)0.03 (+12.04%)0.13 (-4.24%)0.10 (-4.37%)0.11 (-3.50%)0.05 (-13.80%)0.03 (+12.04%)
573678d — 2026-06-29 18:24:150.14 (n/a)0.11 (n/a)0.11 (n/a)0.06 (n/a)0.03 (n/a)0.13 (n/a)0.10 (n/a)0.11 (n/a)0.06 (n/a)0.03 (n/a)

test_gemv[M_2048-K_8192-num_aie_columns_1-tile_size_input_1-tile_size_output_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
21c9a18 — 2026-06-30 16:58:163.92 (-1.23%)3.65 (-1.01%)3.77 (+2.60%)3.36 (-2.60%)0.24 (+32.74%)3.92 (-1.23%)3.65 (-1.01%)3.77 (+2.60%)3.36 (-2.60%)0.24 (+32.74%)
573678d — 2026-06-29 18:24:153.97 (n/a)3.69 (n/a)3.67 (n/a)3.45 (n/a)0.18 (n/a)3.97 (n/a)3.69 (n/a)3.67 (n/a)3.45 (n/a)0.18 (n/a)

test_gemv[M_2048-K_8192-num_aie_columns_2-tile_size_input_1-tile_size_output_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
21c9a18 — 2026-06-30 16:58:166.83 (-11.54%)6.05 (-2.26%)5.93 (+5.39%)5.61 (+12.94%)0.48 (-58.30%)6.82 (-11.54%)6.05 (-2.26%)5.93 (+5.39%)5.60 (+12.94%)0.48 (-58.30%)
573678d — 2026-06-29 18:24:157.72 (n/a)6.19 (n/a)5.63 (n/a)4.96 (n/a)1.14 (n/a)7.71 (n/a)6.19 (n/a)5.62 (n/a)4.96 (n/a)1.14 (n/a)

test_gemv[M_2048-K_8192-num_aie_columns_4-tile_size_input_1-tile_size_output_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
21c9a18 — 2026-06-30 16:58:1613.38 (+37.81%)10.55 (+32.24%)9.83 (+32.34%)8.50 (+17.16%)2.29 (+121.03%)13.37 (+37.81%)10.55 (+32.24%)9.82 (+32.34%)8.49 (+17.16%)2.29 (+121.03%)
573678d — 2026-06-29 18:24:159.71 (n/a)7.98 (n/a)7.43 (n/a)7.25 (n/a)1.04 (n/a)9.70 (n/a)7.98 (n/a)7.42 (n/a)7.25 (n/a)1.04 (n/a)

test_gemv[M_8192-K_2048-num_aie_columns_1-tile_size_input_4-tile_size_output_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
21c9a18 — 2026-06-30 16:58:163.96 (+0.13%)3.69 (-1.88%)3.85 (+0.77%)3.35 (-3.99%)0.31 (+71.06%)3.95 (+0.13%)3.69 (-1.88%)3.84 (+0.77%)3.35 (-3.99%)0.31 (+71.06%)
573678d — 2026-06-29 18:24:153.95 (n/a)3.77 (n/a)3.82 (n/a)3.49 (n/a)0.18 (n/a)3.95 (n/a)3.76 (n/a)3.81 (n/a)3.49 (n/a)0.18 (n/a)

test_gemv[M_8192-K_2048-num_aie_columns_2-tile_size_input_4-tile_size_output_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
21c9a18 — 2026-06-30 16:58:167.61 (+0.50%)6.87 (+3.67%)7.04 (-3.53%)6.03 (+31.53%)0.59 (-53.95%)7.61 (+0.50%)6.87 (+3.67%)7.03 (-3.53%)6.02 (+31.53%)0.59 (-53.95%)
573678d — 2026-06-29 18:24:157.57 (n/a)6.63 (n/a)7.29 (n/a)4.58 (n/a)1.29 (n/a)7.57 (n/a)6.62 (n/a)7.29 (n/a)4.58 (n/a)1.28 (n/a)

test_gemv[M_8192-K_2048-num_aie_columns_4-tile_size_input_4-tile_size_output_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Throughput (max)Throughput (mean)Throughput (median)Throughput (min)Throughput (stddev)
21c9a18 — 2026-06-30 16:58:1610.23 (-25.45%)8.68 (-14.17%)8.28 (-10.23%)7.99 (-6.92%)0.90 (-57.42%)10.22 (-25.45%)8.67 (-14.17%)8.28 (-10.23%)7.98 (-6.92%)0.90 (-57.42%)
573678d — 2026-06-29 18:24:1513.72 (n/a)10.11 (n/a)9.23 (n/a)8.58 (n/a)2.10 (n/a)13.71 (n/a)10.10 (n/a)9.22 (n/a)8.58 (n/a)2.10 (n/a)
iron/operators/layer_norm

test_layer_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)651.80 (n/a)434.44 (n/a)419.60 (n/a)244.20 (n/a)171.91 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.04 (n/a)0.02 (n/a)0.01 (n/a)0.00 (n/a)0.01 (n/a)1978.00 (n/a)752.66 (n/a)562.80 (n/a)226.60 (n/a)700.12 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.02 (n/a)0.02 (n/a)0.02 (n/a)0.02 (n/a)0.00 (n/a)526.50 (n/a)452.04 (n/a)459.60 (n/a)356.60 (n/a)64.77 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.04 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)597.70 (n/a)372.20 (n/a)351.00 (n/a)229.30 (n/a)152.20 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.03 (n/a)0.02 (n/a)0.03 (n/a)0.01 (n/a)0.01 (n/a)586.80 (n/a)377.32 (n/a)291.40 (n/a)287.70 (n/a)131.88 (n/a)

test_layer_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.03 (n/a)0.02 (n/a)0.03 (n/a)0.01 (n/a)0.01 (n/a)637.70 (n/a)411.00 (n/a)324.80 (n/a)273.80 (n/a)161.45 (n/a)
iron/operators/mem_copy

test_mem_copy[input_length_2048-num_cores_1-num_channels_1-bypass_False-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.03 (-39.40%)0.03 (-10.43%)0.03 (+32.19%)0.02 (-15.30%)0.01 (-47.60%)489.60 (+18.06%)353.78 (+4.89%)296.90 (-24.36%)238.90 (+64.99%)124.68 (+11.26%)
573678d — 2026-06-29 18:24:150.06 (n/a)0.03 (n/a)0.02 (n/a)0.02 (n/a)0.02 (n/a)414.70 (n/a)337.28 (n/a)392.50 (n/a)144.80 (n/a)112.07 (n/a)

test_mem_copy[input_length_2048-num_cores_2-num_channels_1-bypass_False-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.04 (+3.03%)0.02 (-27.99%)0.02 (-31.27%)0.00 (-77.50%)0.01 (+77.79%)1923.70 (+344.38%)665.14 (+136.10%)391.20 (+45.54%)193.60 (-2.91%)714.96 (+700.95%)
573678d — 2026-06-29 18:24:150.04 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)432.90 (n/a)281.72 (n/a)268.80 (n/a)199.40 (n/a)89.26 (n/a)

test_mem_copy[input_length_2048-num_cores_2-num_channels_2-bypass_False-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.03 (-2.60%)0.02 (-0.91%)0.02 (-21.34%)0.01 (-19.76%)0.01 (+33.71%)621.80 (+24.63%)442.38 (+11.03%)535.00 (+27.11%)234.20 (+2.67%)181.84 (+77.94%)
573678d — 2026-06-29 18:24:150.04 (n/a)0.02 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)498.90 (n/a)398.42 (n/a)420.90 (n/a)228.10 (n/a)102.19 (n/a)

test_mem_copy[input_length_2048-num_cores_4-num_channels_1-bypass_False-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.03 (-22.31%)0.02 (-17.12%)0.02 (-7.88%)0.01 (+32.99%)0.01 (-45.37%)1027.10 (-24.80%)565.54 (-4.72%)513.10 (+8.55%)315.40 (+28.68%)270.56 (-41.04%)
573678d — 2026-06-29 18:24:150.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)1365.90 (n/a)593.54 (n/a)472.70 (n/a)245.10 (n/a)458.86 (n/a)

test_mem_copy[input_length_2048-num_cores_4-num_channels_2-bypass_False-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.04 (+17.24%)0.02 (+9.76%)0.02 (+10.23%)0.02 (+29.86%)0.01 (+2.27%)449.10 (-22.99%)351.26 (-10.97%)333.00 (-9.26%)228.30 (-14.72%)91.47 (-30.05%)
573678d — 2026-06-29 18:24:150.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)583.20 (n/a)394.52 (n/a)367.00 (n/a)267.70 (n/a)130.76 (n/a)

test_mem_copy[input_length_2048-num_cores_8-num_channels_2-bypass_False-tile_size_256]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.03 (+12.69%)0.02 (+7.37%)0.02 (-5.63%)0.02 (+20.29%)0.01 (+9.88%)544.70 (-16.87%)466.34 (-7.90%)506.20 (+5.97%)274.10 (-11.27%)109.77 (-24.58%)
573678d — 2026-06-29 18:24:150.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)655.20 (n/a)506.34 (n/a)477.70 (n/a)308.90 (n/a)145.54 (n/a)
iron/operators/rms_norm

test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.03 (+11.27%)0.03 (+37.56%)0.03 (+64.67%)0.02 (+37.30%)0.01 (-8.46%)490.30 (-27.16%)322.24 (-30.36%)299.50 (-39.26%)247.90 (-10.12%)98.94 (-37.97%)
573678d — 2026-06-29 18:24:150.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)673.10 (n/a)462.70 (n/a)493.10 (n/a)275.80 (n/a)159.50 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_1-tile_size_2048-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.07 (+28.47%)0.04 (+23.78%)0.05 (+39.49%)0.02 (-4.75%)0.02 (+73.21%)622.30 (+4.99%)365.18 (-5.40%)240.30 (-28.31%)182.20 (-22.17%)217.78 (+46.07%)
573678d — 2026-06-29 18:24:150.05 (n/a)0.04 (n/a)0.04 (n/a)0.02 (n/a)0.01 (n/a)592.70 (n/a)386.04 (n/a)335.20 (n/a)234.10 (n/a)149.09 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.04 (+13.71%)0.03 (+3.13%)0.02 (-27.81%)0.02 (-1.49%)0.01 (+43.18%)520.20 (+1.50%)368.48 (+2.09%)409.30 (+38.51%)217.20 (-12.06%)138.02 (+18.43%)
573678d — 2026-06-29 18:24:150.03 (n/a)0.02 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)512.50 (n/a)360.92 (n/a)295.50 (n/a)247.00 (n/a)116.55 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_1-num_channels_2-tile_size_1024-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.04 (+1.91%)0.02 (+7.04%)0.02 (+16.82%)0.02 (+43.12%)0.01 (-14.57%)570.50 (-30.13%)445.82 (-12.44%)445.70 (-14.40%)262.60 (-1.87%)122.88 (-41.00%)
573678d — 2026-06-29 18:24:150.04 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)816.50 (n/a)509.18 (n/a)520.70 (n/a)267.60 (n/a)208.27 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.04 (+9.37%)0.02 (+3.85%)0.02 (-28.14%)0.02 (+475.96%)0.01 (-36.34%)437.40 (-82.64%)366.30 (-50.92%)409.40 (+39.16%)222.30 (-8.56%)89.62 (-90.97%)
573678d — 2026-06-29 18:24:150.03 (n/a)0.02 (n/a)0.03 (n/a)0.00 (n/a)0.01 (n/a)2519.00 (n/a)746.32 (n/a)294.20 (n/a)243.10 (n/a)992.55 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_1-tile_size_1024-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.05 (+29.71%)0.03 (+12.14%)0.02 (-27.97%)0.02 (+47.09%)0.01 (+14.83%)458.40 (-32.02%)356.82 (-14.34%)422.10 (+38.85%)201.90 (-22.91%)112.01 (-39.03%)
573678d — 2026-06-29 18:24:150.04 (n/a)0.03 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)674.30 (n/a)416.56 (n/a)304.00 (n/a)261.90 (n/a)183.72 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.02 (-36.28%)0.02 (-15.42%)0.02 (-3.69%)0.01 (+1.82%)0.00 (-62.82%)573.40 (-1.80%)443.98 (+9.82%)431.60 (+3.85%)363.80 (+56.95%)79.51 (-40.01%)
573678d — 2026-06-29 18:24:150.04 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.01 (n/a)583.90 (n/a)404.28 (n/a)415.60 (n/a)231.80 (n/a)132.53 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_2-num_channels_2-tile_size_512-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.03 (+32.63%)0.03 (+27.40%)0.03 (+30.10%)0.02 (+17.02%)0.01 (+80.36%)594.10 (-14.54%)394.84 (-17.31%)336.40 (-23.14%)263.70 (-24.61%)147.91 (+11.11%)
573678d — 2026-06-29 18:24:150.03 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)0.00 (n/a)695.20 (n/a)477.50 (n/a)437.70 (n/a)349.80 (n/a)133.12 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.03 (-10.01%)0.02 (-18.95%)0.02 (-33.62%)0.02 (+0.31%)0.01 (-32.35%)520.50 (-0.31%)430.58 (+18.13%)441.30 (+50.67%)287.60 (+11.13%)87.26 (-29.15%)
573678d — 2026-06-29 18:24:150.03 (n/a)0.02 (n/a)0.03 (n/a)0.02 (n/a)0.01 (n/a)522.10 (n/a)364.50 (n/a)292.90 (n/a)258.80 (n/a)123.16 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_1-tile_size_512-weighted_True]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.04 (+19.05%)0.03 (+14.19%)0.02 (+18.39%)0.02 (-2.33%)0.01 (+32.51%)604.70 (+2.39%)410.86 (-8.41%)408.80 (-15.52%)210.80 (-15.98%)160.32 (+14.26%)
573678d — 2026-06-29 18:24:150.04 (n/a)0.02 (n/a)0.02 (n/a)0.02 (n/a)0.01 (n/a)590.60 (n/a)448.60 (n/a)483.90 (n/a)250.90 (n/a)140.32 (n/a)

test_rms_norm[input_length_2048-num_aie_columns_4-num_channels_2-tile_size_256-weighted_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.02 (-4.04%)0.02 (+20.68%)0.02 (+19.34%)0.01 (+262.68%)0.00 (-48.21%)676.50 (-72.43%)461.22 (-46.08%)427.60 (-16.22%)372.70 (+4.19%)122.54 (-86.32%)
573678d — 2026-06-29 18:24:150.02 (n/a)0.02 (n/a)0.02 (n/a)0.00 (n/a)0.01 (n/a)2453.70 (n/a)855.38 (n/a)510.40 (n/a)357.70 (n/a)895.94 (n/a)
iron/operators/rope

test_rope[rows_32-cols_512-angle_rows_32-aie_columns_1-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.38 (-15.90%)0.24 (-0.41%)0.20 (-18.50%)0.19 (+272.74%)0.08 (-48.28%)515.80 (-73.17%)434.44 (-38.03%)494.80 (+22.69%)261.60 (+18.91%)109.73 (-84.37%)
573678d — 2026-06-29 18:24:150.45 (n/a)0.24 (n/a)0.24 (n/a)0.05 (n/a)0.15 (n/a)1922.40 (n/a)701.04 (n/a)403.30 (n/a)220.00 (n/a)702.07 (n/a)

test_rope[rows_32-cols_512-angle_rows_32-aie_columns_2-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.41 (+2.06%)0.30 (+17.16%)0.32 (+59.11%)0.22 (+39.43%)0.08 (-30.86%)448.50 (-28.29%)344.80 (-22.59%)305.50 (-37.15%)239.80 (-2.04%)93.09 (-48.36%)
573678d — 2026-06-29 18:24:150.40 (n/a)0.26 (n/a)0.20 (n/a)0.16 (n/a)0.12 (n/a)625.40 (n/a)445.40 (n/a)486.10 (n/a)244.80 (n/a)180.26 (n/a)

test_rope[rows_32-cols_512-angle_rows_32-aie_columns_4-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.34 (-21.47%)0.23 (-3.60%)0.19 (+0.47%)0.13 (+167.46%)0.10 (-36.44%)748.70 (-62.61%)494.56 (-31.80%)522.90 (-0.48%)289.60 (+27.35%)201.28 (-72.49%)
573678d — 2026-06-29 18:24:150.43 (n/a)0.24 (n/a)0.19 (n/a)0.05 (n/a)0.16 (n/a)2002.40 (n/a)725.14 (n/a)525.40 (n/a)227.40 (n/a)731.54 (n/a)

test_rope[rows_32-cols_512-angle_rows_8-aie_columns_1-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.32 (+75.28%)0.21 (+36.52%)0.21 (+46.52%)0.13 (+13.39%)0.08 (+176.07%)564.40 (-11.80%)398.62 (-20.79%)353.80 (-31.75%)232.40 (-42.96%)140.33 (+48.41%)
573678d — 2026-06-29 18:24:150.18 (n/a)0.15 (n/a)0.14 (n/a)0.12 (n/a)0.03 (n/a)639.90 (n/a)503.24 (n/a)518.40 (n/a)407.40 (n/a)94.56 (n/a)

test_rope[rows_32-cols_512-angle_rows_8-aie_columns_2-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.31 (+104.55%)0.22 (+81.17%)0.20 (+63.47%)0.14 (+103.25%)0.07 (+119.30%)523.10 (-50.80%)367.50 (-44.56%)370.10 (-38.84%)239.80 (-51.11%)112.15 (-51.03%)
573678d — 2026-06-29 18:24:150.15 (n/a)0.12 (n/a)0.12 (n/a)0.07 (n/a)0.03 (n/a)1063.20 (n/a)662.82 (n/a)605.10 (n/a)490.50 (n/a)228.99 (n/a)

test_rope[rows_32-cols_512-angle_rows_8-aie_columns_4-method_type_0]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.25 (-22.51%)0.17 (-17.65%)0.15 (+1.23%)0.13 (-8.72%)0.05 (-43.08%)547.60 (+9.56%)447.44 (+13.94%)478.90 (-1.22%)296.70 (+29.06%)109.02 (-20.29%)
573678d — 2026-06-29 18:24:150.32 (n/a)0.21 (n/a)0.15 (n/a)0.15 (n/a)0.09 (n/a)499.80 (n/a)392.70 (n/a)484.80 (n/a)229.90 (n/a)136.77 (n/a)
iron/operators/softmax

test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_1024]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.48 (+0.57%)0.37 (+55.90%)0.37 (+78.80%)0.25 (+373.83%)0.11 (-31.31%)514.80 (-78.90%)374.92 (-58.33%)350.70 (-44.08%)271.20 (-0.59%)110.05 (-87.40%)
573678d — 2026-06-29 18:24:150.48 (n/a)0.24 (n/a)0.21 (n/a)0.05 (n/a)0.15 (n/a)2439.40 (n/a)899.78 (n/a)627.10 (n/a)272.80 (n/a)873.13 (n/a)

test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.55 (+12.35%)0.38 (+16.78%)0.43 (+59.48%)0.19 (-19.10%)0.15 (+39.96%)678.70 (+23.62%)401.16 (-7.17%)303.50 (-37.29%)239.90 (-10.98%)190.88 (+50.31%)
573678d — 2026-06-29 18:24:150.49 (n/a)0.33 (n/a)0.27 (n/a)0.24 (n/a)0.11 (n/a)549.00 (n/a)432.16 (n/a)484.00 (n/a)269.50 (n/a)126.99 (n/a)

test_softmax[input_length_32768-num_aie_columns_2-num_channels_2-tile_size_512]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.27 (-48.76%)0.21 (-40.34%)0.23 (-37.20%)0.07 (-67.29%)0.08 (-40.61%)1981.90 (+205.75%)831.32 (+92.14%)560.00 (+59.23%)490.10 (+95.18%)644.64 (+257.78%)
573678d — 2026-06-29 18:24:150.52 (n/a)0.35 (n/a)0.37 (n/a)0.20 (n/a)0.14 (n/a)648.20 (n/a)432.66 (n/a)351.70 (n/a)251.10 (n/a)180.18 (n/a)
iron/operators/swiglu_decode

test_swiglu_decode[embedding_dim_1024-hidden_dim_3584]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.00 (+100.00%)0.00 (+76.92%)0.00 (+150.00%)0.00 (+0.00%)0.00 (+191.55%)21812.89 (+12.86%)12094.96 (-24.26%)8617.52 (-49.00%)5069.72 (-53.56%)7586.74 (+135.60%)
573678d — 2026-06-29 18:24:150.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)19328.08 (n/a)15968.12 (n/a)16896.51 (n/a)10916.30 (n/a)3220.18 (n/a)

test_swiglu_decode[embedding_dim_2048-hidden_dim_2048]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.00 (+15.38%)0.00 (-18.97%)0.00 (-38.46%)0.00 (-33.33%)0.00 (+69.93%)21205.41 (+50.59%)11397.11 (+45.58%)9850.80 (+56.67%)5290.94 (-13.54%)6756.41 (+93.16%)
573678d — 2026-06-29 18:24:150.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)0.00 (n/a)14081.45 (n/a)7828.86 (n/a)6287.46 (n/a)6119.72 (n/a)3497.88 (n/a)
iron/operators/swiglu_prefill

test_swiglu_prefill[seq_len_256-embedding_dim_2048-hidden_dim_2048-prio_accuracy_False]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:160.14 (+3.33%)0.10 (+11.69%)0.09 (+4.57%)0.08 (-0.88%)0.03 (+23.84%)26463.69 (+0.85%)21335.95 (-8.45%)24146.40 (-4.26%)14679.38 (-3.19%)5709.16 (+23.05%)
573678d — 2026-06-29 18:24:150.14 (n/a)0.09 (n/a)0.08 (n/a)0.08 (n/a)0.02 (n/a)26241.28 (n/a)23305.43 (n/a)25221.66 (n/a)15163.43 (n/a)4639.82 (n/a)
iron/operators/transpose

test_transpose[M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8-num_batches_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:161.72 (+4.99%)1.11 (-12.13%)1.05 (-29.24%)0.16 (-80.44%)0.65 (+67.99%)3348.60 (+411.31%)1004.28 (+122.59%)498.50 (+41.30%)304.20 (-4.76%)1315.47 (+752.45%)
573678d — 2026-06-29 18:24:151.64 (n/a)1.27 (n/a)1.49 (n/a)0.80 (n/a)0.38 (n/a)654.90 (n/a)451.18 (n/a)352.80 (n/a)319.40 (n/a)154.32 (n/a)

test_transpose[M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8-num_batches_2]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:162.79 (+13.77%)1.34 (-9.60%)1.16 (-6.90%)0.30 (+2.85%)1.03 (+15.19%)3455.70 (-2.77%)1455.52 (+17.21%)901.40 (+7.41%)375.30 (-12.11%)1283.77 (-2.13%)
573678d — 2026-06-29 18:24:152.46 (n/a)1.48 (n/a)1.25 (n/a)0.30 (n/a)0.89 (n/a)3554.20 (n/a)1241.76 (n/a)839.20 (n/a)427.00 (n/a)1311.77 (n/a)

test_transpose[M_2048-N_64-aie_columns_1-channels_1-m_64-n_64-s_8]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
4bb8427 — 2026-06-25 20:01:371.58 (-2.67%)1.21 (+5.65%)1.29 (+41.89%)0.66 (-20.05%)0.38 (-2.53%)792.90 (+25.08%)479.12 (-3.80%)406.80 (-29.52%)331.10 (+2.76%)189.18 (+26.62%)
4bb8427 — 2026-06-23 22:46:491.63 (n/a)1.14 (n/a)0.91 (n/a)0.83 (n/a)0.38 (n/a)633.90 (n/a)498.02 (n/a)577.20 (n/a)322.20 (n/a)149.40 (n/a)

test_transpose[M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8-num_batches_1]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
21c9a18 — 2026-06-30 16:58:161.87 (-7.56%)1.34 (-3.85%)1.41 (-4.62%)0.64 (-14.91%)0.45 (-6.76%)822.70 (+17.51%)448.52 (+5.82%)371.10 (+4.83%)279.90 (+8.15%)215.31 (+24.41%)
573678d — 2026-06-29 18:24:152.03 (n/a)1.39 (n/a)1.48 (n/a)0.75 (n/a)0.49 (n/a)700.10 (n/a)423.86 (n/a)354.00 (n/a)258.80 (n/a)173.06 (n/a)

test_transpose[M_2048-N_64-aie_columns_1-channels_2-m_64-n_64-s_8]

Commit/Date Bandwidth (max)Bandwidth (mean)Bandwidth (median)Bandwidth (min)Bandwidth (stddev)Latency (max)Latency (mean)Latency (median)Latency (min)Latency (stddev)
4bb8427 — 2026-06-25 20:01:371.75 (-15.44%)1.32 (-0.50%)1.36 (+22.47%)0.77 (-19.20%)0.36 (-21.75%)678.10 (+23.76%)430.02 (-0.09%)385.10 (-18.34%)300.10 (+18.29%)146.96 (+19.46%)
4bb8427 — 2026-06-23 22:46:492.07 (n/a)1.32 (n/a)1.11 (n/a)0.96 (n/a)0.46 (n/a)547.90 (n/a)430.40 (n/a)471.60 (n/a)253.70 (n/a)123.02 (n/a)
Phoenix - Examples

IRON

Tested on 2026_06_30_16_53_16 at commit 21c9a18.

Trends:

IRON Trends

Comment on lines +11 to +13
pytest.importorskip(
"stream", reason="stream-dse not installed (see requirements_stream.txt)"
)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please add requirements_stream.txt here so that this test runs in CI?

asyms added 4 commits June 29, 2026 10:08
SwiGLUPrefillStream compiles the whole SwiGLU-prefill block (gate/up GEMMs +
SiLU + elementwise-mul + down GEMM) as a single fused MLIR design generated by
stream-dse, producing one xclbin instead of chaining separately-compiled
sub-operators. The design is generated at build time by stream_design.py and
compiled through IRON's normal flow.

The fused design's per-kernel operand layouts (the tiled-strided DMA tiling) are
authored on the IRON side and fed into stream-dse code generation rather than
hand-copied inside stream: iron.common.layout provides a TiledStridedLayout type,
and swiglu_prefill_stream/stream_kernels.py injects IRON's layouts through
optimize_allocation_co(kernels=...) -- the override hook added in stream-dse
1.13.4 -- keeping stream's kernel construction and replacing only
operand_layouts().

stream-dse is an optional dependency (requirements_stream.txt); the operator's
test skips when it is absent. Importing iron.operators no longer requires an NPU
runtime (lazy XRT import), so the package loads on hosts without XRT/pyxrt.
Includes a minimal k=1 demo under demos/swiglu_prefill_stream/.
The stream-dse SwiGLU-prefill operator now builds and runs through the
full-ELF flow instead of an xclbin. Its placed/routed whole-array design
requires --dynamic-objFifos so each core's tile loop stays rolled
(runtime-indexed objectFifos); without it the down-projection cores fully
unroll to ~26 KB and overflow the AIE2p core program memory.

- FullElfArtifact gains an extra_flags list, forwarded by
  AieccFullElfCompilationRule into the aiecc --generate-full-elf command.
  It defaults to empty, so existing full-ELF designs are unaffected.
- SwiGLUPrefillStream.set_up_artifacts builds a FullElfArtifact with
  extra_flags=["--dynamic-objFifos"]; get_callable runs it via
  FullELFCallable (device "main" / sequence "sequence", positional args).
requirements_stream.txt installs stream-dse from PyPI, but its AIE codegen
dependencies (snax-mlir/snaxc, xdsl-aie, aie-python-extras) are not PyPI
packages -- they are pulled in by the stream-setup-aie console script. Run it
right after the stream requirements so the stream-backed operators (e.g.
swiglu_prefill_stream) can generate their MLIR at build time in CI.
@asyms asyms force-pushed the stream-dse-fused-swiglu branch from 96bbfd8 to 8e4c6c4 Compare June 29, 2026 13:54
asyms added 4 commits June 29, 2026 21:10
stream-dse renders its workload graph to PNG during MLIR generation
(stream/stages/parsing/onnx_model_parser.py -> workload.visualize ->
pydot), which shells out to graphviz's `dot`. The CI image did not install
graphviz, so the stream-backed swiglu_prefill_stream test failed at
operator.compile() with: FileNotFoundError: "dot" not found in path.
The runner user is non-root (no sudo for apt), so graphviz must be baked
into the image.

Also make test_docker_ci.sh tolerant of a missing secret_github_token: its
interactive shell mode never registers a runner, so the PAT is optional.
Adds SwiGLUPrefillStreamK2, which deploys the SwiGLU-prefill block as two
fusion groups in one full-ELF: a gate/up/SiLU/mul front end producing the
hidden state h, and a separate down-projection group consuming it (h stays on
device as a scratch buffer). The split is expressed entirely in the stream
mapping (make_swiglu_mapping(split_groups=True)); stream-dse emits one
aie.device design per group and IRON fuses them via FusedMLIROperator.

- FusedMLIROperator gains an opt-in extra_flags forwarded to its full-ELF
  build; the k=2 operator passes --dynamic-objFifos so the down-projection
  cores stay rolled and fit AIE2p program memory (empty for other fused ops).
- stream_design.load_swiglu_k2_group runs the two-group codegen (cached) and
  returns one group's design, re-parsed into an aie module with the fused
  operator's func_prefix applied to kernel symbols/objects.
- op_k2 defines the per-group child operator (kernels + arg-spec) and the
  SwiGLUPrefillStreamK2 wrapper.
- test_swiglu_prefill_stream_k2 mirrors the single-group test through the named
  consolidated buffers; it self-skips on stream-dse builds without split_groups.

Requires stream-dse with two-fusion-group support (the split_groups mapping
option + the multi-group AIE codegen pipeline).
stream-dse 1.13.5 adds the two-fusion-group support the k=2 SwiGLU-prefill
variant needs (make_swiglu_mapping(split_groups=...) + the multi-group AIE
codegen pipeline). With the floor bumped, CI installs a stream-dse that has it,
so test_swiglu_prefill_stream_k2 runs instead of self-skipping.
stream-dse 1.13.6 makes the debug workload-graph visualization non-fatal, so
stream-backed code generation no longer hard-fails when graphviz (`dot`) is
absent -- which is what was breaking the swiglu_prefill_stream tests on the CI
runner image. Bumping the floor picks that up.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants