Skip to content

explore all stories

All Tools›Categories›LLM Inference & Serving

⚙️

LLM Inference & Serving

74 tools

Inference runtimes, model serving platforms, fine-tuning infra, and GPU/accelerator providers for LLMs.

OpenRouter, Inc.

The Unified Interface For LLMs

The High-Throughput and Memory-Efficient inference and serving engine for LLMs

Fast serving framework for LLMs and agents

Fastest way to start building with Gemini

Ollama is the easiest way to run open AI models locally or in the cloud, with a simple API and 40,000+ integrations.

Amazon Web Services

The platform for building generative AI applications and agents at production scale

Hugging Face Hub

The central hub for models, datasets, and Spaces.

Deploy AI models with optimized inference microservices

AI developer platform

Baidu AI Cloud's large-model platform

Decoupled DiLoCo

Google DeepMind

Resilient, distributed AI training at scale

DFlash software product

Alibaba Cloud software product

Miles RL Training

Reinforcement-learning training software

Access Nous Research's AI portal

AI platform and product suite

Hangzhou DeepSeek Artificial Intelligence Co., Ltd.

A kernel library written in tilelang

Zyphra Inference

Serverless inference for frontier open-weight models

Playground for AFM models

One API +400 AI models

Baidu AI Studio LLM API

LLM API for Baidu AI Studio

Inference Platform: Deploy AI models in production

BytePlus Pte Ltd.

AI-Native Cloud for Enterprise Growth

Cerebras Systems

AI training and inference platform

The best coding agents run on Baseten

Conway Research

Infrastructure for self-improving, self-replicating, autonomous AI

1+1 > 2 - Combine Advanced Reasoning and Coding

High-throughput, low-latency expert parallel communication library.

FP8 GEMM library

AI supercomputer on your desk

Run frontier AI locally.

Generative media platform for developers.

Fireworks AI, Inc.

Build with the best open models.

Fast MLA decoding kernel for Hopper GPUs

Gemini Live API

Real-time, bidirectional multimodal API for Gemini.

Google AI Edge Gallery

Explore, Experience, and Evaluate the Future of On-Device Generative AI with Google AI Edge.

Cloud computing services from Google

Build with Grok

SLO-aware Benchmarking and Evaluation Platform for Optimizing Real-World LLM Inference

The AI community building the future.

JigsawStack, Inc.

AI interface platform

Run ML workloads remotely on cloud TPUs and GPUs.

Idea to AI product, ⚡️ fast.

AI Gateway to provide model access, fallbacks and spend tracking across 100+ LLMs. All in the OpenAI format.

Element Labs, Inc.

Run AI models, locally and privately.

Augment your apps with AI

ModelScope 魔搭社区

Inference from Kernel to Cloud.

A KVCache-centric Disaggregated Architecture for LLM Serving

GenAI-native serving and modeling, built for performance.

Scalable RL post-training for language models.

NVIDIA DGX 8xB200

The foundation for your AI factory.

Open Generative AI

Generative AI service

Unverified product profile

One API. Multi-provider. Zero markup.

PaddlePaddle AI Studio

One-stop AI development platform

Text-to-speech by Kyutai

Prime Intellect

Prime Intellect

Distributed training and inference infrastructure

Prime Intellect Lab

Prime Intellect

AI lab for experimenting with language models

AI infrastructure developers trust

A concise domain-specific language for high-performance GPU/CPU kernels.

TileLang-Ascend

Ascend TileLang adapter

Together Fine-Tuning

Fine-tune language models on Together AI.

Train models with TrainEngine.ai

Easily run & train models locally.

Train and Run Models Locally

open-source, no-code web UI for training, running and exporting open models in one unified local interface

Cloud GPU Marketplace

Developer API for AI models

Build, deploy, and scale machine learning models.

Multimodal inference and serving

Developer AI platform

AI Force Singapore Pte. Ltd.

AI software platform

Zyphra Technologies Inc.

A full-stack AI platform on AMD powered by TensorWave

AI Primer

Your daily guide to AI tools, workflows, and creative inspiration.

© 2026 AI Primer. All rights reserved.