Ollama

Get up and running with large language models.

Visit site View stories

A local runtime and API tool for downloading, running, and serving large language models on your machine.

Recent stories

14 linked stories

newsPRIMARY2026-06-20

Ollama raises GLM-5.2 cloud capacity on NVIDIA B300s

Ollama said it doubled GPU capacity for GLM-5.2 cloud usage and said the model is currently hosted only in the US. The rollout adds capacity as open-model demand climbs, so users should check hosting and privacy details before deploying.

workflowSECONDARY2026-06-17

Codex supports open-weight models via Ollama, vLLM, and Responses-compatible endpoints

Codex workflows can now run against open-weight models served through compatible Responses API endpoints, with Ollama and vLLM publishing direct paths for GLM-5.2 and Kimi K2.7 Code. That matters because teams can keep the Codex interface while swapping to self-hosted or lower-cost inference backends.

releaseSECONDARY2026-06-05

Google releases Gemma 4 QAT: E2B drops to ~1GB and Ollama, SGLang, vLLM add support

Google published Gemma 4 QAT checkpoints and mobile-focused quant formats, cutting Gemma 4 E2B to roughly 1GB of memory. Ollama, SGLang, and vLLM added day-one support, making local deployment more practical on phones, laptops, and low-VRAM GPUs.

releaseSECONDARY2026-06-04

NVIDIA releases Nemotron 3 Ultra: 550B MoE, 1M context

NVIDIA shipped Nemotron 3 Ultra, a 550B/55B-active hybrid Mamba-Transformer MoE with open weights, data, and recipe, plus broad runtime and host support. It matters because the model pairs frontier open benchmarks with immediate agent-serving options, though local use still needs heavy quantization or large-memory hardware.

releaseSECONDARY2026-06-03

Gemma 4 12B ships encoder-free multimodal local model with 16GB target and 256K context

Google released Gemma 4 12B, an Apache 2.0 encoder-free multimodal model with native audio and vision for 16GB-class laptops. Day-zero support in llama.cpp, vLLM, Ollama, MLX, and SGLang should make local agents and on-device apps easier to deploy immediately.

releaseSECONDARY2026-06-02

Nous Research launches Hermes Desktop public preview for macOS, Windows, and Linux

Nous Research put Hermes Agent into a native desktop app and added Portal and Ollama-backed setup paths plus a Tailscale remote-connect fix. Hermes now has a local-first desktop surface instead of a terminal-only workflow.

releaseSECONDARY2026-06-01

Microsoft and NVIDIA launch RTX Spark PCs with 128GB unified memory and 1 PFLOP FP4

Microsoft and NVIDIA unveiled RTX Spark systems, including Surface Laptop Ultra and DGX-class Windows hardware, with 128GB unified memory and 1 PFLOP FP4 local AI. Day-one support from Hermes Agent, vLLM, Ollama, and Unsloth makes the launch useful for local inference and fine-tuning, not just a PC refresh.

releaseSECONDARY2026-05-31

MiniMax M3 launches with 1M context and 59.0 SWE-Bench Pro

MiniMax shipped M3 with a 1M-token context window, native multimodal input, and frontier coding claims across SWE-Bench Pro, Terminal Bench, and MCP Atlas. It also appeared on OpenRouter, Ollama Cloud, Venice, Hermes, Cline, Together, and Arena on day one.

releaseSECONDARY2026-05-22

Letta Code adds embedded local server with Ollama and LM Studio support

Letta Code can now run fully locally with an embedded server, removing the login and Docker requirement while keeping memory sync via `/memory-repository`. That gives developers a local-first agent harness with optional Ollama and LM Studio support instead of forcing everything through Letta’s hosted API.

releaseSECONDARY2026-04-27

OpenClaw 2026.4.26 adds Google Live Talk, openclaw migrate, and Matrix E2EE

OpenClaw 2026.4.26 shipped Google Live Talk, local-model fixes, openclaw migrate imports for Claude and Hermes, and one-command Matrix E2EE. It also hardens plugins, Docker, and transcript compaction for self-hosted agent runs.

workflowSECONDARY2026-04-26

DeepSeek V4 supports Anthropic-compatible routing into Claude Code and Cowork for ~90% lower cost

Independent guides showed DeepSeek V4 running inside Claude Cowork and Claude Code via Anthropic-compatible endpoints, and Ollama added launch commands for Claude-style wrappers. The workflow matters because teams can keep Claude-centered agent UX while sharply lowering model spend, with provider compatibility and setup still the main caveats.

newsSECONDARY2026-04-24

DeepSeek V4 adds day-1 support from vLLM, SGLang, Ollama, OpenCode, Venice, and Together

Within a day of launch, vLLM, SGLang, Ollama cloud, OpenCode, Venice, Together, and Baseten added support or hosted access for DeepSeek V4. That makes Flash and Pro easier to test across local, routed, and managed agent stacks.

releaseSECONDARY2026-04-22

Qwen3.6-27B releases with 77.2 SWE-Bench Verified and Apache 2.0

Alibaba released Qwen3.6-27B, a dense open model with multimodal input and thinking or non-thinking modes that beats Qwen3.5-397B-A17B across major coding benchmarks. Day-one support across vLLM, SGLang, Ollama, llama.cpp, GGUF, and MLX makes it ready for local and hosted coding agents.

releasePRIMARY2026-04-17

Ollama supports Hermes Agent in v0.21 with ollama launch hermes

Ollama 0.21 added native Hermes Agent support through the ollama launch hermes command. That makes a self-improving local agent loop available without a hosted inference stack, with memory and skills running on top of Ollama’s model serving.