Ray Serve LLM

Serving LLMs

Ray Serve LLM is a high-performance, scalable framework for deploying large language models in production. It specializes Ray Serve primitives for distributed LLM serving workloads and provides OpenAI API compatibility for online inference in Ray Serve applications.

Recent stories

0 linked stories

No linked stories yet.