GLM-5.1 lands on Modal, Together AI, Letta Code, and Tembo
Providers and agent platforms added GLM-5.1 endpoints across Modal, Together AI, Letta Code, Tembo, and Tabbit, with free trials, no-key access, and 99.9% SLA options. Use the new hosting options to test the model for coding and long-horizon agent workloads without waiting on self-hosting.

TL;DR
- Modal upgraded its free public endpoint to GLM-5.1 for the next month, while Official blog post says the model is aimed at long-horizon agent work and coding.
- Together AI positioned GLM-5.1 for production-scale agentic engineering, and Together's model card adds the concrete specs: 754B MoE, 40B active parameters, 200K context, thinking mode, tool calling, and structured JSON output.
- Letta made GLM-5.1 available inside Letta Code, then paired it with remote environments that keep agent memory and state intact across machines, according to Letta's remote environments post.
- Tembo exposed GLM-5.1 only through its OpenCode and Pi agents with no API key required, which lines up with Tembo's models docs describing Tembo-hosted models alongside BYOK options.
- The host list widened fast on launch day: Z.ai's repost of Tabbit Browser added Tabbit to the wave, while Z.ai's own docs describe GLM-5.1 as an up-to-8-hour long-horizon coding model optimized for agent frameworks.
You can try the model on Modal's free endpoint, pull the exact API string and pricing from Together's serverless catalog, and browse Z.ai's migration guide for details like tool_stream=true, 200K context, and 128K max output. Letta's docs show the one-command letta server flow for always-on remote agents, while Tembo's docs frame GLM-5.1 as another model option inside existing coding-agent workflows.
Modal and Together endpoints
The fastest way GLM-5.1 spread was plain old hosting. Modal's launch post says the endpoint is free to try for a month, and Modal's blog says its existing free GLM endpoint was upgraded to 5.1 on April 7.
Together's model page is more specific about what buyers actually get:
- 28% coding improvement over GLM-5
- 754B-parameter MoE, 40B active
- 200K context, 131K max output
- Thinking mode, tool calling, structured JSON output
Together's serverless models doc also lists zai-org/GLM-5.1 with a 202,752-token context window at $1.40 per 1M input tokens and $4.40 per 1M output tokens. Together's highlight thread adds the production packaging: 99.9% SLA plus serverless and dedicated deployment options.
Letta remote environments
Letta turned the model drop into a workflow pitch. In the thread, Letta says you can switch to GLM-5.1 with /model, then run agents in remote environments so they keep memory and state wherever they execute.
Letta's product post says the same agent can move between machines inside one conversation, carrying conversation history and context repositories with it. The remote environments docs reduce setup to letta server, which registers a local machine so the agent stays reachable from chat.letta.com, another computer, or a phone while still editing files and running shell commands locally.
Tembo and Tabbit distribution
Tembo's angle was access. Tembo says GLM-5.1 is live only inside its OpenCode and Pi agents and does not require an API key. Tembo's models page explains the split: workspaces can use Tembo-hosted models or bring their own provider keys, so no-key access is a product choice, not a separate public API.
Tabbit showed up the same day through Z.ai's repost, which is a small but telling detail. GLM-5.1 is already being packaged less like a single endpoint and more like a default model option inside browsers, hosted inference platforms, and stateful coding agents.