Ollama
Get up and running with large language models locally.
A local runtime and API platform for downloading, running, and serving large language models on your own machine.

Recent stories
Nous Research put Hermes Agent into a native desktop app and added Portal and Ollama-backed setup paths plus a Tailscale remote-connect fix. Hermes now has a local-first desktop surface instead of a terminal-only workflow.
Microsoft and NVIDIA unveiled RTX Spark systems, including Surface Laptop Ultra and DGX-class Windows hardware, with 128GB unified memory and 1 PFLOP FP4 local AI. Day-one support from Hermes Agent, vLLM, Ollama, and Unsloth makes the launch useful for local inference and fine-tuning, not just a PC refresh.
MiniMax shipped M3 with a 1M-token context window, native multimodal input, and frontier coding claims across SWE-Bench Pro, Terminal Bench, and MCP Atlas. It also appeared on OpenRouter, Ollama Cloud, Venice, Hermes, Cline, Together, and Arena on day one.
Letta Code can now run fully locally with an embedded server, removing the login and Docker requirement while keeping memory sync via `/memory-repository`. That gives developers a local-first agent harness with optional Ollama and LM Studio support instead of forcing everything through Letta’s hosted API.
OpenClaw 2026.4.26 shipped Google Live Talk, local-model fixes, openclaw migrate imports for Claude and Hermes, and one-command Matrix E2EE. It also hardens plugins, Docker, and transcript compaction for self-hosted agent runs.
Independent guides showed DeepSeek V4 running inside Claude Cowork and Claude Code via Anthropic-compatible endpoints, and Ollama added launch commands for Claude-style wrappers. The workflow matters because teams can keep Claude-centered agent UX while sharply lowering model spend, with provider compatibility and setup still the main caveats.
Within a day of launch, vLLM, SGLang, Ollama cloud, OpenCode, Venice, Together, and Baseten added support or hosted access for DeepSeek V4. That makes Flash and Pro easier to test across local, routed, and managed agent stacks.
Alibaba released Qwen3.6-27B, a dense open model with multimodal input and thinking or non-thinking modes that beats Qwen3.5-397B-A17B across major coding benchmarks. Day-one support across vLLM, SGLang, Ollama, llama.cpp, GGUF, and MLX makes it ready for local and hosted coding agents.
Ollama 0.21 added native Hermes Agent support through the ollama launch hermes command. That makes a self-improving local agent loop available without a hosted inference stack, with memory and skills running on top of Ollama’s model serving.