ML performance engineer currently focusing on LLM and proteinLM pretraining efficiency and distributed training infra. Open to positions in industry
- Manchester, UK
- in/ali-naeimi5055
- @Ali_NT99
Pinned Loading
-
-
nanogpt-fp8
nanogpt-fp8 PublicNanochat inspired LLM pretraining using Transformer-Engine with MXFP8 and NVFP4 support. Up to 30% faster than nanochat
-
matmul_assembly_x86
matmul_assembly_x86 PublicHyper-optimized FP32 GEMM kernels in handwritten AVX2 ASM with a worklog of optimizations implemented
Assembly 5
-
llm.c
llm.c PublicForked from karpathy/llm.c
3x faster LLM training on CPU than Karpathy's original repo
Cuda 2
-
Candles-ProbCheck
Candles-ProbCheck PublicSimple notebook to check probability of every pair of candlesticks in a day being of the same or opposite color in a given period of time
Jupyter Notebook
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.



