TOPIC21 stories

Deep Research

Long-horizon research agents operating over internal or web data.

Stories

Perplexity Computer adds Brain context graph with +25% correctness on memory-heavy tasks

Perplexity rolled out Brain, a self-updating context graph that carries prior sessions, files, and decisions into new Computer tasks. In research preview for Max users, it matters because Perplexity says the memory layer improves correctness and recall while lowering per-task cost on history-dependent work.

RELEASE1w ago

Firecrawl launches Research Index with /search/research API and 18% arXivQA recall gain

Firecrawl released a research-specific search index with 3M+ arXiv papers, GitHub artifacts, and a /search/research interface across API, CLI, MCP, and SDKs. It combines literature retrieval, claim verification, and code lookup in one surface for research agents.

RELEASE1w ago

Exa launches Agent API at less than half the cost of GPT-5.5 and Opus

Exa launched Agent, an API that combines its search stack, mixed-model orchestration, and agent harness for deep web research. Exa says it can handle Opus- and GPT-5.5-class browsing tasks at less than half the cost.

RELEASE1w ago

Sakana launches Marlin as a Virtual CSO with up to 8-hour autonomous research

Sakana launched Marlin, a Virtual CSO that runs for up to 8 hours, forms hypotheses, browses sources, and returns slide decks plus reports. It turns Sakana’s long-horizon reasoning work into a shipped deep-research product.

RELEASE2w ago

OpenRouter launches Fusion API with model panels and judge routing

OpenRouter launched Fusion, a server-side panel API that sends prompts to multiple models and combines one answer. Early logs also showed a web-path issue where Fusion still invoked Claude Opus 4.8 as judge and billed for it until API-side control was clarified.

RELEASE2w ago

OpenRouter launches Fusion API with DRACO panel tests at 1% of Fable

OpenRouter launched Fusion, a server-side panel API that fans prompts to multiple models, judges the outputs, and returns one synthesized answer. The company said DRACO landed within 1% of Fable at roughly half the price, but the published evals do not cover long-horizon tasks.

RELEASE2w ago

Perplexity Computer adds native Deep Research with Search as Code

Perplexity made Deep Research a native skill inside Computer and tied it to the same harness, long-running sandboxes, tools, connectors, and licensed data. The update collapses multi-step research into one persistent agent interface instead of a separate mode.

WORKFLOW1mo ago

Codex users report 2-hour mech-interp runs and 150-hour tasks with `/goal`

Days after `/goal` workflows first surfaced, users showed the command also works in the Codex app and shared runs for SSH setup, mech-interp scripts, and recurring work that lasted hours or days. The evidence points to Codex being used as a long-running research and ops agent, though the app still lacks explicit `/goal` UI.

RELEASE1mo ago

Perplexity adds Finance Search to Agent API with live data and FinSearchComp T1 lead

Perplexity added Finance Search to the Agent API with licensed real-time market data and cited web sources in one tool call. The company says it led FinSearchComp T1 on live-data accuracy and lowest cost per correct answer, so teams building finance agents should evaluate it against their current stack.

NEWS1mo ago

Perplexity Computer integrates with Microsoft Teams via Marketplace launch

Perplexity Computer is now available through Microsoft Teams and the Microsoft Marketplace, bringing its research and document workflows into workspace chat. The integration extends Computer from standalone use into enterprise collaboration surfaces.

RELEASE2mo ago

Google launches Deep Research Max with MCP, native charts, and 85.9% BrowseComp

Google added Deep Research and Deep Research Max to the Gemini API with collaborative planning, multimodal inputs, MCP support, and native charts. The agents push cited web-plus-private-data reports into developer workflows, and Max is tuned for slower overnight runs.

RELEASE3mo ago

Chroma launches Context-1, a 20B search agent with Apache 2.0 weights

Chroma released Context-1, a 20B search agent it says pushes the speed-cost-accuracy frontier for agentic search, with open weights on Hugging Face. Benchmark it against your current search stack before wiring it into production.

WORKFLOW3mo ago

Autoresearch claims 2718 Elo after 70 experiments on a Rust chess engine

A developer says an autoresearch loop hill-climbed a vibecoded Rust engine to 2718 Elo after running more than 70 experiments under a 500 ms move budget. The real takeaway is the workflow: automated experiment loops can optimize code against a measurable target.

NEWS3mo ago

MIT Technology Review reports OpenAI targets an AI research intern by September 2026

OpenAI told MIT Technology Review it wants an autonomous research intern by September and a multi-agent research lab by 2028, with Codex described as an early step. Treat it as a roadmap for longer-horizon agents, not a shipped capability.

NEWS3mo ago

Reason-ModernColBERT claims nearly 90% on BrowseComp-Plus with a 150M retriever

LightOn says its 150M multi-vector retriever is pushing BrowseComp-Plus close to saturation, with results showing search-call behavior and retriever choice matter nearly as much as model size. Retrieval engineers should watch multi-hop setup and tool-calling limits before copying the benchmark.

NEWS3mo ago

Reason-ModernColBERT ranks 87.59 on BrowseComp-Plus

LightOn’s late-interaction retriever paired with GPT-5 reached 87.59 accuracy on BrowseComp-Plus while using fewer search calls than larger baselines. It suggests deep-research quality may now hinge more on retrieval architecture than on swapping in ever larger LLMs.

RELEASE3mo ago

Parallel launches Tempo MPP billing for per-search agent payments

Parallel integrated with Tempo and the Machine Payments Protocol so agents can buy search, content extraction, and multi-hop research on demand without API keys or account setup. This gives agent stacks a concrete pattern for per-use tool billing instead of preprovisioned subscriptions.

RELEASE3mo ago

Hugging Face Papers serves Markdown to agents and adds a paper-pages skill

Hugging Face now serves Markdown when agents fetch Papers pages and published a skill for searching papers plus linked models, datasets, and Spaces. Research agents can cut token waste and retrieve paper context in a format that is easier to parse and ground.

RELEASE3mo ago

Together releases Open Deep Research v2 with app, eval dataset, and repo

Together released Open Deep Research v2 and published the hosted app, codebase, blog, and evaluation dataset together. Use it as a full open reference stack for report-generation agents rather than another closed demo.

RELEASE3mo ago

Stanford and Princeton open LabClaw with 211 skills for biomedical agent workflows

The LabClaw team open-sourced a 211-skill layer for dry-lab reasoning, literature work, medicine, biology, and lab automation. Use it as a starting skill library for AI scientist systems instead of assembling generic tools from scratch.

RELEASE3mo ago

Hyperbrowser launches HyperPlex to run parallel browser agents across models

Hyperbrowser released HyperPlex, an open-source research agent that splits a goal into subtasks, runs browser workers in parallel, and returns cited reports. Teams building deep-research products can study the repo for orchestration, live browsing, and report synthesis patterns.