Gemma LLM Serving Cost Optimization Agent Readiness Benchmarks Qwen Inference Optimization Coding Agents GLM DX Tooling

Together AI

The AI Native Cloud

Full-stack AI platform for running, training, and serving open-source AI models, including serverless and dedicated inference, batch inference, GPU clusters, sandboxes, managed storage, and fine-tuning.

Recent stories

4 linked stories

newsSECONDARY2026-06-20

GLM-5.2 ships to BrowserCode, Hyper, OpenCode, and Together in 3 days

BrowserCode, Hyper, OpenCode, Together, and other vendors added GLM-5.2 soon after release. That turns the open model into a deployable option across coding, browser automation, and hosted chat.

newsPRIMARY2026-06-14

Together AI ranks DeepSeek V4 Pro #1 on Artificial Analysis latency and speed

Together AI said its DeepSeek V4 Pro deployment now leads Artificial Analysis on both output speed and latency. The claim matters because it turns V4 serving into an inference-systems story about KV cache reuse, prefix reuse, kernels, and endpoint profiles rather than model weights alone.

releaseSECONDARY2026-05-21

Qwen3.7 Max launches with 1M context, 35-hour autonomy, and 56.6 AA Index

Alibaba launched Qwen3.7 Max as its new flagship agent model with 1M context, stronger coding and reasoning scores, and cross-harness benchmarks. OpenRouter, Together, AI Gateway, and Kilo support it on day one, making it ready for immediate deployment.

releasePRIMARY2026-05-15

Together AI launches Gemma-4-31B-it-Pearl endpoint with 25%+ discounted pricing

Together AI launched Gemma-4-31B-it-Pearl as a serverless endpoint that uses Pearl's proof-of-useful-work emissions to offset inference cost. It matters because the pricing model ties serving economics to compute-side byproducts instead of token billing alone.