High-performance serving framework for large language and multimodal models.
Open-source high-performance serving framework for large language models and multimodal models, designed for low-latency and high-throughput inference across single-GPU and distributed deployments.