LMCache

A KV Cache Management Layer for Scalable LLM Inference.

LMCache is an open-source KV cache management layer for LLM inference that stores, reuses, observes, and transforms KV caches across serving engines to reduce time-to-first-token and improve throughput for long-context, multi-turn, and RAG workloads.

Recent stories

0 linked stories

No linked stories yet.