DeepGEMM
clean and efficient FP8 GEMM kernels with fine-grained scaling
A DeepSeek CUDA kernel library for high-performance GEMM and MoE primitives used in large language model training and inference, including FP8/FP4/BF16 kernels and MQA scoring.

Recent stories
0 linked stories
No linked stories yet.