Skip to content
AI Primer

A local runtime and API platform for downloading, running, and serving large language models on your own machine.

Screenshot of Ollama website

Recent stories

9 linked stories
releaseSECONDARY2026-06-02
Nous Research launches Hermes Desktop public preview for macOS, Windows, and Linux

Nous Research put Hermes Agent into a native desktop app and added Portal and Ollama-backed setup paths plus a Tailscale remote-connect fix. Hermes now has a local-first desktop surface instead of a terminal-only workflow.

releaseSECONDARY2026-06-01
Microsoft and NVIDIA launch RTX Spark PCs with 128GB unified memory and 1 PFLOP FP4

Microsoft and NVIDIA unveiled RTX Spark systems, including Surface Laptop Ultra and DGX-class Windows hardware, with 128GB unified memory and 1 PFLOP FP4 local AI. Day-one support from Hermes Agent, vLLM, Ollama, and Unsloth makes the launch useful for local inference and fine-tuning, not just a PC refresh.

releaseSECONDARY2026-05-31
MiniMax M3 launches with 1M context and 59.0 SWE-Bench Pro

MiniMax shipped M3 with a 1M-token context window, native multimodal input, and frontier coding claims across SWE-Bench Pro, Terminal Bench, and MCP Atlas. It also appeared on OpenRouter, Ollama Cloud, Venice, Hermes, Cline, Together, and Arena on day one.

releaseSECONDARY2026-05-22
Letta Code adds embedded local server with Ollama and LM Studio support

Letta Code can now run fully locally with an embedded server, removing the login and Docker requirement while keeping memory sync via `/memory-repository`. That gives developers a local-first agent harness with optional Ollama and LM Studio support instead of forcing everything through Letta’s hosted API.

releaseSECONDARY2026-04-27
OpenClaw 2026.4.26 adds Google Live Talk, openclaw migrate, and Matrix E2EE

OpenClaw 2026.4.26 shipped Google Live Talk, local-model fixes, openclaw migrate imports for Claude and Hermes, and one-command Matrix E2EE. It also hardens plugins, Docker, and transcript compaction for self-hosted agent runs.

workflowSECONDARY2026-04-26
DeepSeek V4 supports Anthropic-compatible routing into Claude Code and Cowork for ~90% lower cost

Independent guides showed DeepSeek V4 running inside Claude Cowork and Claude Code via Anthropic-compatible endpoints, and Ollama added launch commands for Claude-style wrappers. The workflow matters because teams can keep Claude-centered agent UX while sharply lowering model spend, with provider compatibility and setup still the main caveats.

newsSECONDARY2026-04-24
DeepSeek V4 adds day-1 support from vLLM, SGLang, Ollama, OpenCode, Venice, and Together

Within a day of launch, vLLM, SGLang, Ollama cloud, OpenCode, Venice, Together, and Baseten added support or hosted access for DeepSeek V4. That makes Flash and Pro easier to test across local, routed, and managed agent stacks.

releaseSECONDARY2026-04-22
Qwen3.6-27B releases with 77.2 SWE-Bench Verified and Apache 2.0

Alibaba released Qwen3.6-27B, a dense open model with multimodal input and thinking or non-thinking modes that beats Qwen3.5-397B-A17B across major coding benchmarks. Day-one support across vLLM, SGLang, Ollama, llama.cpp, GGUF, and MLX makes it ready for local and hosted coding agents.

releasePRIMARY2026-04-17
Ollama supports Hermes Agent in v0.21 with ollama launch hermes

Ollama 0.21 added native Hermes Agent support through the ollama launch hermes command. That makes a self-improving local agent loop available without a hosted inference stack, with memory and skills running on top of Ollama’s model serving.

AI PrimerAI Primer

Your daily guide to AI tools, workflows, and creative inspiration.

© 2026 AI Primer. All rights reserved.