Skip to content
AI Primer

DeepGEMM

clean and efficient FP8 GEMM kernels with fine-grained scaling

A DeepSeek CUDA kernel library for high-performance GEMM and MoE primitives used in large language model training and inference, including FP8/FP4/BF16 kernels and MQA scoring.

Screenshot of DeepGEMM website

Recent stories

0 linked stories
No linked stories yet.
AI PrimerAI Primer

Your daily guide to AI tools, workflows, and creative inspiration.

© 2026 AI Primer. All rights reserved.