GLM LLM Serving Inference Optimization Cost Optimization DX Cost

Wafer

The fastest open source LLMs for enterprise

Wafer is a hosted LLM inference product for enterprise and developer use, offering serverless and dedicated endpoints for fast optimized open-source language models through OpenAI-compatible and Anthropic-compatible APIs.

Recent stories

2 linked stories

releaseSECONDARY2026-06-24

Vercel AI Gateway adds GLM-5.2 Fast at 150-250 tok/s

Vercel and Wafer launched a serverless GLM-5.2 endpoint on AI Gateway with 1M context and published pricing. Teams get a high-throughput open-model option inside an existing gateway instead of managing GLM inference directly.

newsPRIMARY2026-06-20

Wafer claims GLM-5.2 hits 222 tok/s and 12.6s end-to-end

Wafer said its GLM-5.2 deployment leads Artificial Analysis on throughput and latency, and priced usage at $1.20 input and $4.10 output per million tokens. Compare serverless and dedicated endpoints if you need speed at scale.