Easy, fast, and cost-efficient LLM serving for everyone.
Open-source library and inference/serving engine for LLMs, focused on high throughput and memory efficiency.