Skip to content
AI Primer

FlashMLA

Efficient Multi-head Latent Attention kernels

Open-source library of optimized attention kernels from DeepSeek, implementing sparse and dense MLA decoding and prefill kernels, plus dense MHA prefill support.

Screenshot of FlashMLA website

Recent stories

0 linked stories
No linked stories yet.
AI PrimerAI Primer

Your daily guide to AI tools, workflows, and creative inspiration.

© 2026 AI Primer. All rights reserved.