Skip to content

explore all stories

All Tools›Categories›LLM Inference & Serving

⚙️

LLM Inference & Serving

70 tools

Inference runtimes, model serving platforms, fine-tuning infra, and GPU/accelerator providers for LLMs.

BytePlus Pte Ltd.

AI-Native Cloud for Enterprise Growth

Hugging Face Hub

The central place to collaborate on models, datasets, and Spaces.

OpenRouter, Inc.

The Unified Interface For LLMs

Playground for AFM-4.5B

Fastest way to start building with Gemini

One API +400 AI models

Amazon Web Services

The platform for building generative AI applications and agents at production scale

AI for problem solvers

Baidu AI Studio LLM API

OpenAI-compatible LLM API for Baidu AI Studio

以Agent为核心的一站式企业级大模型服务平台

Inference Platform: Deploy AI models in production

Cerebras Systems

AI training and inference platform

The best coding agents run on Baseten

Conway Research

Infrastructure for self-improving, self-replicating, autonomous AI

Decoupled DiLoCo

Google DeepMind

Resilient, distributed AI training at scale

1+1 > 2 - Combine Advanced Reasoning and Coding

An efficient expert-parallel communication library

clean and efficient FP8 GEMM kernels with fine-grained scaling

AI supercomputer on your desk

Run frontier AI locally.

Generative media platform for developers.

Fireworks AI, Inc.

Fastest inference for generative AI

Efficient Multi-head Latent Attention Kernels

Alibaba Cloud software product

Gemini Live API

Real-time, bidirectional multimodal API for Gemini.

Google AI Edge Gallery

Explore, Experience, and Evaluate the Future of On-Device Generative AI with Google AI Edge.

Cloud computing services from Google

SLO-aware Benchmarking and Evaluation Platform for Optimizing Real-World LLM Inference

The AI community building the future.

JigsawStack, Inc.

AI interface platform

Run ML workloads remotely on cloud TPUs and GPUs.

Idea to AI product, ⚡️ fast.

AI Gateway to provide model access, fallbacks and spend tracking across 100+ LLMs. All in the OpenAI format.

Element Labs, Inc.

Run AI models, locally and privately.

Miles RL Training

Enterprise-Grade Reinforcement Learning for Large-Scale Model Training

Open-source AI model community and MaaS platform

Inference from Kernel to Cloud.

A KVCache-centric Disaggregated Architecture for LLM Serving

GenAI-native serving and modeling, built for performance.

Scalable RL post-training for language models.

Portal for Nous models and services.

NVIDIA DGX 8xB200

Unified AI platform for develop-to-deploy AI pipelines

Designed for rapid, reliable deployment of accelerated generative AI inference anywhere.

Ollama is the easiest way to run open AI models locally or in the cloud, with a simple API and 40,000+ integrations.

Open Generative AI

Free Higgsfield AI, Freepik & Krea AI Alternative

Open-source specification and ecosystem for interoperable LLM interfaces.

AI platform and product suite

PaddlePaddle AI Studio

人工智能学习与实训社区

A lightweight text-to-speech application designed to run efficiently on CPUs.

Prime Intellect

Prime Intellect

Distributed training and inference infrastructure

Prime Intellect Lab

Prime Intellect

AI lab for experimenting with language models

Everything you need to train, deploy, and scale AI all in one place.

High-Performance Serving Framework for LLMs and VLMs

Hangzhou DeepSeek Artificial Intelligence Co., Ltd.

Optimized GPU kernels for LLM operations, built with TileLang.

TileLang-Ascend

Ascend TileLang adapter

Together Fine-Tuning

Fine-tune open-source models for real production use

Train Models privately

Easily run & train models locally.

Train and Run Models Locally

open-source, no-code web UI for training, running and exporting open models in one unified local interface

Launch Fast, Pay Less

Developer API for AI models

Build, deploy, and scale machine learning models.

The High-Throughput and Memory-Efficient inference and serving engine for LLMs

Easy, fast, and cheap omni-modality model serving for everyone

AI for all humanity

Xiaomi MiMo Orbit

100T-token creator incentive program for MiMo builders

AI Force Singapore Pte. Ltd.

AI software platform

Zyphra Technologies Inc.

A full-stack AI platform on AMD powered by TensorWave

Zyphra Inference

Serverless inference for frontier open-weight models

AI Primer

Your daily guide to AI tools, workflows, and creative inspiration.

© 2026 AI Primer. All rights reserved.