Wafer
The fastest open source LLMs for enterprise
Wafer is a hosted LLM inference product for enterprise and developer use, offering serverless and dedicated endpoints for fast optimized open-source language models through OpenAI-compatible and Anthropic-compatible APIs.
Recent stories
2 linked stories
releaseSECONDARY2026-06-24
Vercel AI Gateway adds GLM-5.2 Fast at 150-250 tok/s
Vercel and Wafer launched a serverless GLM-5.2 endpoint on AI Gateway with 1M context and published pricing. Teams get a high-throughput open-model option inside an existing gateway instead of managing GLM inference directly.
newsPRIMARY2026-06-20
Wafer claims GLM-5.2 hits 222 tok/s and 12.6s end-to-end
Wafer said its GLM-5.2 deployment leads Artificial Analysis on throughput and latency, and priced usage at $1.20 input and $4.10 output per million tokens. Compare serverless and dedicated endpoints if you need speed at scale.