alint77

Follow

Ali Naeimi alint77

Follow

ML performance engineer currently focusing on LLM and proteinLM pretraining efficiency and distributed training infra. Open to positions in industry

10 followers · 18 following

Manchester, UK
in/ali-naeimi5055
@Ali_NT99

Achievements

Achievements

Pinned Loading

microsoft/dion microsoft/dion Public

Dion optimizer algorithm

Python 483 54
nanogpt-fp8 nanogpt-fp8 Public

Nanochat inspired LLM pretraining using Transformer-Engine with MXFP8 and NVFP4 support. Up to 30% faster than nanochat

Python 6 2
flash-mHC flash-mHC Public

Fast implementation of manifold-constrained hyperconnections in Triton.

Python 1
matmul_assembly_x86 matmul_assembly_x86 Public

Hyper-optimized FP32 GEMM kernels in handwritten AVX2 ASM with a worklog of optimizations implemented

Assembly 5
llm.c llm.c Public

Forked from karpathy/llm.c

3x faster LLM training on CPU than Karpathy's original repo

Cuda 2
Candles-ProbCheck Candles-ProbCheck Public

Simple notebook to check probability of every pair of candlesticks in a day being of the same or opposite color in a given period of time

Jupyter Notebook