Skip to content
AI Primer
breaking

Ollama raises GLM-5.2 cloud capacity on NVIDIA B300s

Ollama said it doubled GPU capacity for GLM-5.2 cloud usage and said the model is currently hosted only in the US. The rollout adds capacity as open-model demand climbs, so users should check hosting and privacy details before deploying.

4 min read
Ollama raises GLM-5.2 cloud capacity on NVIDIA B300s
Ollama raises GLM-5.2 cloud capacity on NVIDIA B300s

TL;DR

Ollama's own replies did most of the story here. The launch thread bundled the 1M-token context claim with ready-to-run commands, the capacity update pinned the cloud fleet to NVIDIA B300 Blackwell GPUs in the US, and the reply chain kept narrowing the operational detail, from US-only hosting for GLM-5.2 to a separate note that Ollama has some broader scaling capacity in Europe and very little in Singapore.

Capacity spike on B300s

The sequence was fast. Ollama's launch thread introduced GLM-5.2 on cloud, then a support reply described demand as "another record breaking one" before a second follow-up in the same thread said the company had doubled capacity.

A few hours later, Ollama's main capacity post specified what changed: more US-based cloud capacity on NVIDIA B300 Blackwell GPUs. That is a concrete infrastructure detail most launch threads skip.

US hosting, with broader regional capacity elsewhere

Ollama answered the hosting question several times, and the wording stayed narrow. One reply said GLM-5.2 is currently hosted only in the US, while another separated that fact from the broader privacy discussion.

The same reply chain added a second detail: Ollama's regional-capacity note said the company has some scaling capacity in Europe and very little in Singapore. Read together, that suggests GLM-5.2 cloud is US-only for now, even if Ollama's wider infrastructure footprint is not.

Privacy policy became part of the rollout

Instead of publishing a fresh infra note, Ollama kept pointing users to its privacy policy. In replies, the company and the company again said the policy states that user data is not used or sold, and the original launch thread paired that with a zero-data-retention claim.

That made privacy part of the operational story, not just brand copy, because the same thread was also answering where inference runs.

Commands and model surfaces

The launch thread broke access into four entry points:

An availability reply also said the model was usable immediately after launch.

Local runtimes and quantization are a separate story

Cloud capacity was the headline, but Ollama's replies also drew a line between hosted access and local packaging. One reply said smaller quantizations can make local runs feasible, though quality varies today.

Another reply contrasted Ollama with raw llama.cpp usage, saying its packaging work is meant to preserve repeated tool-call reliability, installation sanity, and model behavior rather than shipping oddly compressed or out-of-context quantizations. That is a useful clue about how Ollama wants GLM-5.2 consumed, especially for agent-style workloads.

Further reading

Discussion across the web

Where this story is being discussed, in original context.

On X· 4 threads
Capacity spike on B300s1 post
US hosting, with broader regional capacity elsewhere1 post
Privacy policy became part of the rollout2 posts
Commands and model surfaces1 post
Share on X