FlashMLA
Efficient Multi-head Latent Attention kernels
Open-source library of optimized attention kernels from DeepSeek, implementing sparse and dense MLA decoding and prefill kernels, plus dense MHA prefill support.

Recent stories
0 linked stories
No linked stories yet.