LMCache
A KV Cache Management Layer for Scalable LLM Inference.
LMCache is an open-source KV cache management layer for LLM inference that stores, reuses, observes, and transforms KV caches across serving engines to reduce time-to-first-token and improve throughput for long-context, multi-turn, and RAG workloads.
Recent stories
0 linked stories
No linked stories yet.