Ray Serve LLM
Serving LLMs
Ray Serve LLM is a high-performance, scalable framework for deploying large language models in production. It specializes Ray Serve primitives for distributed LLM serving workloads and provides OpenAI API compatibility for online inference in Ray Serve applications.
Recent stories
0 linked stories
No linked stories yet.