Long-horizon research agents operating over internal or web data.
Chroma released Context-1, a 20B search agent it says pushes the speed-cost-accuracy frontier for agentic search, with open weights on Hugging Face. Benchmark it against your current search stack before wiring it into production.
A developer says an autoresearch loop hill-climbed a vibecoded Rust engine to 2718 Elo after running more than 70 experiments under a 500 ms move budget. The real takeaway is the workflow: automated experiment loops can optimize code against a measurable target.
OpenAI told MIT Technology Review it wants an autonomous research intern by September and a multi-agent research lab by 2028, with Codex described as an early step. Treat it as a roadmap for longer-horizon agents, not a shipped capability.
LightOn says its 150M multi-vector retriever is pushing BrowseComp-Plus close to saturation, with results showing search-call behavior and retriever choice matter nearly as much as model size. Retrieval engineers should watch multi-hop setup and tool-calling limits before copying the benchmark.
LightOn’s late-interaction retriever paired with GPT-5 reached 87.59 accuracy on BrowseComp-Plus while using fewer search calls than larger baselines. It suggests deep-research quality may now hinge more on retrieval architecture than on swapping in ever larger LLMs.
Parallel integrated with Tempo and the Machine Payments Protocol so agents can buy search, content extraction, and multi-hop research on demand without API keys or account setup. This gives agent stacks a concrete pattern for per-use tool billing instead of preprovisioned subscriptions.
Hugging Face now serves Markdown when agents fetch Papers pages and published a skill for searching papers plus linked models, datasets, and Spaces. Research agents can cut token waste and retrieve paper context in a format that is easier to parse and ground.
Together released Open Deep Research v2 and published the hosted app, codebase, blog, and evaluation dataset together. Use it as a full open reference stack for report-generation agents rather than another closed demo.
The LabClaw team open-sourced a 211-skill layer for dry-lab reasoning, literature work, medicine, biology, and lab automation. Use it as a starting skill library for AI scientist systems instead of assembling generic tools from scratch.
Hyperbrowser released HyperPlex, an open-source research agent that splits a goal into subtasks, runs browser workers in parallel, and returns cited reports. Teams building deep-research products can study the repo for orchestration, live browsing, and report synthesis patterns.