Skip to content
AI Primer
TOPIC18 stories

Agent Readiness

How prepared a codebase and environment are for agents.

RELEASE12th May
Perceptron releases Mk1 with 2 FPS video reasoning, 32K context, and $0.15 per 1M input

Perceptron launched Mk1, a multimodal model for video and embodied reasoning with native 2 FPS video, 32K context, and structured spatial outputs. OpenRouter access and the low input price make it usable for deployment, not just demos.

NEWS8th May
METR says Claude Mythos Preview hits 16-hour p50 Horizon in early snapshot

METR said an early Claude Mythos Preview snapshot reached at least a 16-hour 50% time horizon, with only five tasks in-suite at that range. The result matters because Mythos is beyond METR's stable measurement band, so cross-model comparisons are less reliable.

RELEASE2w ago
DeepSeek releases Vision beta for image understanding in DeepSeek Chat

DeepSeek began rolling out Vision beta as a new image-understanding mode in Chat, and early testers reported fast OCR and strong object recognition. The rollout appears limited or staggered, so watch for broader access and formal docs before relying on it.

NEWS2w ago
Plurai introduces vibe-training with sub-100ms agent guardrails and 43% fewer failures

Plurai launched vibe-training to turn natural-language intents into task-specific eval and guardrail APIs backed by small models. That matters because it positions SLM-based checks as a faster, cheaper alternative to frontier LLM judges for production agents.

RELEASE3w ago
Claude Managed Agents adds memory in public beta with file-backed session state

Anthropic put memory into public beta for Claude Managed Agents, storing retained context as files developers can export and edit. The change lets agent state persist across sessions without a separate memory service.

RELEASE4w ago
Google DeepMind releases Gemini Robotics-ER 1.6 with 93% instrument reading

Google DeepMind shipped Gemini Robotics-ER 1.6 to the Gemini API and AI Studio with better visual-spatial reasoning, multi-view success detection, and gauge reading. The model's 93% instrument-reading score targets robots that need to reason over cluttered scenes and physical constraints.

RELEASE1mo ago
LangChain launches Deep Agents Deploy beta with AGENTS.md and mcp.json

LangChain launched Deep Agents Deploy in beta as a production path for open, model-agnostic agent harnesses configured with AGENTS.md, skills, and mcp.json. Deployments run on LangSmith and can expose MCP, A2A, and agent protocol while teams choose models and sandbox providers.

NEWS1mo ago
MiniMax M2.7 benchmarks 34% hallucination rate on new tests

New third-party tests put MiniMax M2.7 at a 34% hallucination rate, roughly 65 tps, and 27.04% on Vibe Code Bench while users pushed it through physics-heavy web demos. It looks increasingly viable for agent workflows, but performance still swings by task and harness.

NEWS1mo ago
Google DeepMind launches Kaggle benchmark contest with $200k to measure AGI capabilities

Google DeepMind and Kaggle opened a global challenge to build cognitive benchmarks across learning, metacognition, attention, executive function, and social cognition. Join if you work on evals and want reusable tasks with human baselines instead of another saturated leaderboard.

RELEASE1mo ago
Manus launches My Computer for local macOS and Windows control

Manus moved from a cloud sandbox onto local machines with My Computer, a desktop app that can organize files, run commands, and build apps on macOS and Windows. Use it if you want agent workflows over private local data and hardware instead of a remote browser sandbox.

RELEASE1mo ago
Factory launches Analytics to tie tokens, tool calls, commits, and PRs to software output

Factory released an analytics layer for teams deploying coding agents, surfacing usage, tool calls, activity, and productivity from tokens through pull requests. Use it if you need ROI, readiness, and cost visibility as agent adoption scales.

RELEASE2mo ago
supermemory launches CLI with npx install, scoped agent access, and audit logs

supermemory launched a CLI that exposes platform actions directly to agents and added scoped agent access with tag-level permissions plus audit logs. Use it to wire memory into agent loops without granting a full account.

NEWS2mo ago
Claude Opus 4.6 ranks 78.3% on MRCR v2 at 1M tokens

Third-party MRCR v2 results put Claude Opus 4.6 at a 78.3% match ratio at 1M tokens, ahead of Sonnet 4.6, GPT-5.4, and Gemini 3.1 Pro. If you are testing long-context agents, measure retrieval quality and task completion, not just advertised context window size.

RELEASE2mo ago
Markov AI releases Computer Use Large on Hugging Face: 48,478 videos and 12,300 hours

Markov AI released Computer Use Large on Hugging Face with 48,478 screen recordings spanning about 12,300 hours across six professional apps. Use it to train and evaluate GUI agents on real software workflows with a large CC-BY dataset.

NEWS2mo ago
Tiiny claims pocket AI server runs local 120B models with an OpenAI-compatible API

Tiiny claims its pocket-sized local AI server can run open models up to 120B and expose an OpenAI-compatible local API without token fees. Privacy-sensitive teams should validate throughput and model quality before deploying always-on local agents.

RELEASE2mo ago
NVIDIA releases Nemotron 3 Super: 120B open model targets 1M-token agent workloads

NVIDIA released Nemotron 3 Super, a 120B open model with 1M-token context and a hybrid architecture tuned for agent workloads, then landed it in Perplexity and Baseten. Try it if you need an open-weight long-context option that is already available in hosted stacks.

NEWS2mo ago
Meta adds Moltbook to Meta Superintelligence Labs in deal closing mid-March

Meta acquired Moltbook and is bringing its founders into Meta Superintelligence Labs as it bets on agent identity and social coordination layers. Watch how Meta productizes registry, verification, and cross-agent discovery for agent ecosystems.

RELEASE2mo ago
Hermes Agent introduces self-evolution with a reported 39.5% quality gain

Nous Research released a self-evolution package for Hermes Agent that uses DSPy and GEPA to optimize skills, prompts, and code, and reported a phase-one score increase from 0.408 to 0.569 on one skill. Agent teams can study the repo for fallback model, memory, and self-improvement loop patterns.

AI PrimerAI Primer

Your daily guide to AI tools, workflows, and creative inspiration.

© 2026 AI Primer. All rights reserved.