releaseMarch 10, 2026

CopilotKit releases LLMock for deterministic LLM testing with SSE and tool calls

CopilotKit open-sourced LLMock, a deterministic mock LLM server with provider-style SSE streaming and tool-call injection. Use it to run repeatable CI and agent tests without spending live model budget.

2 min read

CopilotKit releases LLMock for deterministic LLM testing with SSE and tool calls

TL;DR

CopilotKit open-sourced LLMock as a “deterministic mock LLM server” for testing AI apps, aiming to replace flaky live-model calls in CI with repeatable local responses from a real HTTP endpoint launch post.
The launch says LLMock supports “authentic SSE streaming in real provider formats,” which matters for teams testing token streaming behavior instead of only static JSON responses feature list.
CopilotKit’s feature list also highlights fixture-based routing, regex and predicate matching, plus tool-call injection, so agent workflows can be exercised without hitting OpenAI-, Gemini-, or Anthropic-style APIs live.
The project page linked from the announcement says LLMock adds error injection and request journaling on top of deterministic fixtures, giving teams a way to test outages, rate limits, and assertions without model spend project docs.

What shipped

LLMock is a local mock server for LLM-powered applications, not just a library stub. In CopilotKit’s launch post, the core promise is a “real HTTP server that works across all your processes,” which means app code, background workers, and tests can all point at the same fake provider endpoint.

The linked project docs say the server is fixture-driven and deterministic, with support for substring matching, regex routing, and custom predicates against full request context. That gives teams a way to model prompt-specific behavior while keeping test outputs stable across CI runs.

How it maps to real-world agent tests

The most practical detail in CopilotKit’s feature list is support for “authentic SSE streaming in real provider formats.” That makes LLMock more useful than a plain response recorder for apps that render partial tokens, stream UI updates, or depend on provider-style event sequencing.

CopilotKit also says LLMock can inject tool calls for agent testing tool-call support, and the project docs add error injection plus request journaling. Together, that covers three common failure points in production AI systems: tool invocation paths, transport-level streaming behavior, and resilience testing around rate limits or outages. A reposted note from the team says the internal version “started saving us money on API costs” before being open-sourced team repost.

TL;DR

What shipped

How it maps to real-world agent tests

Discussion across the web