Performance-optimized inference microservices for AI model deployment
Containerized, performance-optimized inference microservices from NVIDIA for deploying AI models across cloud, data center, and workstation environments, with industry-standard APIs and support for multiple use cases including language, visual, retrieval, and multimodal models.