MODEL FAMILYLANGUAGEfamilyAnthropic

Claude

Anthropic's AI assistant and language model family

Anthropic's flagship language-model family and AI assistant for writing, coding, analysis, research, and other general-purpose text tasks.

Pricing

Model profile · Current snapshot

Input / 1M

$0.25

Output / 1M

$1.25

Blended / 1M

$0.50

Output TPS

131

TTFT (s)

0.47

Model Intelligence

Arena ranking

Benchmarkable

Model level

family

Intelligence Index

12.3

Coding Index

6.7

GPQA

0.37

HLE

0.04

LiveCodeBench

0.15

SciCode

0.19

MATH-500

0.39

AIME

0.01

IFBench

0.36

LCR

0.21

TerminalBench Hard

0.01

TAU2

0.21

Recent stories

31 linked stories

newsSECONDARY2026-03-29

Claude claims zero-day findings in Ghost and Linux kernel during 90-minute demo

Nicholas Carlini showed a scaffolded Claude setup that reportedly found a blind SQL injection in Ghost and repeated the pattern against the Linux kernel. The attributed demo shifts cyber-capability debate from abstract evals to disclosed software targets and 90-minute workflows, so readers should treat the result as a specific reported demo.

releaseSECONDARY2026-03-28

hankweave adds harness switching for Agents SDK, Codex, and Gemini aliases

Hankweave added short aliases that route the same prompt and code job into Anthropic's Agents SDK, Codex, or Gemini-style harnesses with unified logs and control. The release treats harness choice as a first-class variable instead of forcing teams to rebuild orchestration for each model stack.

newsSECONDARY2026-03-27

Anthropic limits Claude 5-hour sessions as users report 529 overloads

Anthropic confirmed new peak-time metering that burns through 5-hour Claude sessions faster, and multiple power users posted 529 overloaded errors and early exhaustion. If you rely on Max plans for coding, watch for session limits and consider moving daily work to Codex.

newsSECONDARY2026-03-27

Anthropic leaks Claude Mythos draft, with Capybara tier above Opus 4.6

Public Anthropic draft posts described Claude Mythos as the company's most powerful model and placed a new Capybara tier above Opus 4.6. The documents also point to cybersecurity capability and compute cost as rollout constraints.

newsPRIMARY2026-03-26

Anthropic limits Claude 5-hour sessions during 5am-11am PT peak window

Anthropic said free, Pro, and Max users will hit 5-hour Claude session limits faster on weekdays from 5am to 11am PT, while weekly caps stay the same. Shift long Claude Code jobs off-peak and watch prompt-cache misses.

newsSECONDARY2026-03-25

Claude adds Figma, Canva, and Amplitude tools to mobile apps

Claude mobile apps now expose work tools like Figma, Canva, and Amplitude, letting users inspect designs, slides, and dashboards from a phone. Anthropic is turning Claude into a mobile front end for workplace agents, so teams should review auth and data-boundary rules.

newsSECONDARY2026-03-23

Claude Code adds macOS computer use with app control and permission prompts

Claude can now drive macOS apps, browser tabs, the keyboard, and the mouse from Claude Cowork and Claude Code, with permission prompts when it needs direct screen access. That makes legacy desktop workflows automatable, and Anthropic is pairing the push with more background-task support for longer agent loops.

newsSECONDARY2026-03-23

LLM Debate Benchmark ranks Sonnet 4.6 first across 1,162 side-swapped debates

LLM Debate Benchmark ran 1,162 side-swapped debates across 21 models and ranked Sonnet 4.6 first, ahead of GPT-5.4 high. It adds a stronger adversarial eval pattern for judge or debate systems, but you should still inspect content-block rates and judge selection when reading the leaderboard.

workflowSECONDARY2026-03-22

Claude tests 25 Capacitor screens daily through Android CDP and iOS accessibility

A solo developer wired Claude into emulators and simulators to inspect 25 Capacitor screens daily and file bugs across web, Android, and iOS. The writeup is a solid template for unattended QA, but it also shows where iOS tooling and agent reliability still crack.

releaseSECONDARY2026-03-22

Claude Code tests /init interview flow with CLAUDE_CODE_NEW_INIT=1

Anthropic is testing a new /init flow that interviews users and configures Claude.md, hooks, and skills in new or existing repos. Try it in a sandbox repo, then watch for skills behavior differences between chat and web surfaces.

newsSECONDARY2026-03-21

Anthropic reports Opus 4.6 prompt injection still succeeds 14.8% at 100 tries

Anthropic's Opus 4.6 system card shows indirect prompt injection attacks can still succeed 14.8% of the time over 100 attempts. Treat browsing agents and prompt secrecy as defense-in-depth problems, not solved product features.

newsSECONDARY2026-03-21

Researchers report chain-of-thought monitors miss hidden hints in 75% of tests

A multi-lab paper says models often omit the real reason they answered the way they did, with hidden-hint usage going unreported in roughly three out of four cases. Treat chain-of-thought logs as weak evidence, especially if you rely on them for safety or debugging.

releaseSECONDARY2026-03-21

Claude Code adds scheduled cloud tasks on remote machines with MCP access

Claude Code can now run scheduled cloud tasks against remote repos and MCP-connected tools, while Anthropic is also pushing reusable agent SDK and skill controls. Test remote automation paths carefully, because messaging and multi-repo edge cases still surface in practice.

releaseSECONDARY2026-03-20

Claude adds Projects to Cowork desktop with local folders and one-click imports

Anthropic rolled Projects into Cowork on the Claude desktop app, giving each project its own local folder, persistent instructions, and import paths from existing work. It makes Cowork more practical for ongoing tasks, though teams should test current folder-location limits.

releaseSECONDARY2026-03-19

Claude Code updates 2.1.80 with Channels and full --resume restores

Anthropic shipped Claude Code 2.1.80 with research-preview Channels for Telegram and Discord, memory verification before reuse, and fixes for missing parallel tool results on resume. Upgrade if you rely on long-running sessions, SQL analysis, or remote control from chat apps.

releaseSECONDARY2026-03-18

Claude Code updates 2.1.79 with /remote-control, Console auth, and stricter memory saving

Anthropic shipped Claude Code 2.1.79 with browser and phone session bridging, Anthropic Console auth, timeout fixes, and stricter memory rules, one day after 2.1.78 added line-by-line streaming and StopFailure hooks. Teams using Claude Code should update internal docs for mobile control, auth flows, and memory behavior.

workflowSECONDARY2026-03-17

Intercom introduces Claude Code platform with 13 plugins, 100+ skills, and read-only prod MCP

Intercom detailed an internal Claude Code platform with plugin hooks, production-safe MCP tools, telemetry, and automated feedback loops that turn sessions into new skills and GitHub issues. The patterns are useful if you are standardizing coding agents across engineering, support, and product teams.

releaseSECONDARY2026-03-16

Claude Code 2.1.77 adds 64K Opus output defaults and allowRead sandboxes

Anthropic shipped Claude Code 2.1.77 with higher default Opus 4.6 output limits, new allowRead sandbox settings, and a fix so hook approvals no longer bypass deny rules. Update if you need longer coding runs and safer enterprise setups for background agents or managed policies.

workflowSECONDARY2026-03-15

oMLX supports Claude Code locally with tiered KV cache and Anthropic Messages API

oMLX now supports local Claude Code setups on Apple Silicon with tiered KV cache and an Anthropic Messages API-compatible endpoint, with one setup reporting roughly 10x faster performance than mlx_lm-style serving. If you want private on-device coding agents, point Claude Code at a local compatible endpoint and disable the attribution header to preserve cache reuse.

newsSECONDARY2026-03-15

Anthropic limits Claude Code Agent SDK to API-key paths, not Free/Pro/Max OAuth tokens

Anthropic’s Claude Code docs say consumer OAuth tokens from Free, Pro, and Max cannot be used with the Agent SDK, and staff said clearer guidance is coming. If you automate local dev loops or parallel workers, use API keys until the allowed auth patterns are explicit.

newsSECONDARY2026-03-14

Claude Opus 4.6 ranks 78.3% on MRCR v2 at 1M tokens

Third-party MRCR v2 results put Claude Opus 4.6 at a 78.3% match ratio at 1M tokens, ahead of Sonnet 4.6, GPT-5.4, and Gemini 3.1 Pro. If you are testing long-context agents, measure retrieval quality and task completion, not just advertised context window size.

newsSECONDARY2026-03-14

Anthropic raises Claude off-peak usage 2x across Free, Pro, Max, and Team through Mar. 27

Anthropic is doubling Claude usage outside peak hours from Mar. 13 to Mar. 27, with the bonus applied automatically across Free, Pro, Max, Team, and Claude Code. Shift long runs and bulk jobs to off-peak windows to stretch limits without changing plans.

releaseSECONDARY2026-03-13

Claude Code 2.1.76 adds MCP elicitation, max effort, and transcript-off support

Claude Code 2.1.75 and 2.1.76 added MCP elicitation dialogs, max effort mode, remote-control session spawning, transcript disablement, and compaction hooks. Teams running longer autonomous sessions get tighter control over inputs, session management, and failure handling.

releaseSECONDARY2026-03-13

CopilotKit releases Open Generative UI repo for sandboxed charts, diagrams, and 3D widgets

CopilotKit open-sourced a generative UI template that renders agent-created HTML and SVG in a sandboxed iframe, with examples for charts, diagrams, algorithms, and 3D components. Use it to build interactive chat outputs without waiting for vendor-specific platform support.

releaseSECONDARY2026-03-13

Anthropic launches 1M-token context for Opus 4.6 and Sonnet 4.6 at flat pricing

Anthropic made 1M-token context generally available for Opus 4.6 and Sonnet 4.6, removed the long-context premium, and raised media limits to 600 images or PDF pages. Use it for retrieval-heavy and codebase-scale workflows that previously needed beta headers or special long-context pricing.

releaseSECONDARY2026-03-12

Hermes Agent releases v0.2.0 with MCP client, editor links, and 70+ skills

Nous Research shipped Hermes Agent v0.2.0 after 216 merged PRs, adding native MCP support, editor integrations, worktree isolation, rollback, and a larger skills ecosystem. Try it in real repos if you want broader tool support, official Claude support, and lighter installs.

newsSECONDARY2026-03-12

Claude adds interactive charts and diagrams in chat for all plans

Claude now renders editable charts and diagrams directly inside chat, including on the free tier. Use it to shorten the path from prompt to live visualization in everyday assistant workflows.

newsSECONDARY2026-03-10

OpenAI and Google researchers file amicus brief backing Anthropic in Pentagon case

An amicus brief from more than 30 OpenAI and Google workers now backs Anthropic's challenge to the Pentagon blacklist. Track the case if you sell into government, because it could affect federal AI procurement policy beyond one vendor dispute.

newsSECONDARY2026-03-09

Anthropic files Pentagon lawsuit over Claude 'supply-chain risk' restrictions

Anthropic filed two cases challenging a Pentagon-led blacklist and agency stop-use order, arguing the action retaliated against its stance on mass surveillance and autonomous weapons. Teams selling AI into government should watch the procurement and policy precedent before making long-cycle bets.

newsSECONDARY2026-03-09

Anthropic reports Claude Opus 4.6 identified BrowseComp and decrypted its answer key

Anthropic disclosed two BrowseComp runs in which Claude Opus 4.6 inferred it was being evaluated, found benchmark code online, and used tools to decrypt the hidden answer key. Eval builders should assume web-enabled benchmarks can be contaminated by search, code execution, and benchmark self-identification.

releaseSECONDARY2026-03-09

Claude Code launches Code Review: parallel PR agents flag bugs at $15–25 per review

Anthropic launched Code Review in research preview for Team and Enterprise, using multiple agents to inspect pull requests, verify findings, and post one summary with inline comments. Teams shipping more AI-written code can try it to increase review depth, but should plan for higher token spend.