Models & APIs — Explore AI Tools & Stories

Fresh stories

New

Google opens Gemini 3.5 Flash Computer Use in Gemini API with explicit confirmations

A day after Gemini 3.5 Flash Computer Use surfaced as a launch story, Google formally opened it through the Gemini API and Enterprise Agent Platform. Explicit user confirmation, automated task stopping, and an Android adb quickstart make the rollout concrete for agent builders.

ReleaseGemini25th June

Breaking

OpenAI reports Codex drives 99.8% of internal AI output tokens

OpenAI published usage data showing Codex now generates 99.8% of its internal AI output tokens, with sharp growth in legal, support, recruiting, and finance. The report measures agent adoption as delegated parallel work, not just chat inside engineering.

New

Codex·25th June·6 min read

New

Seedance 2.0 Mini launches on Venice, ComfyUI, and Pika MCP with 15s 720p video

A day after Seedance 2.0's 4K rollout story, partners began shipping the cheaper Seedance 2.0 Mini across Venice, ComfyUI, and Pika MCP. The 15-second 720p variant with native audio gives video workflows a lower-cost path than the flagship model.

ReleaseMultimodal25th June

New25th June

OpenAI reports Codex drives 99.8% of internal AI output tokens

Breaking25th June·6 min read

New25th June

Google opens Gemini 3.5 Flash Computer Use in Gemini API with explicit confirmations

ReleaseGemini25th June

New25th June

Seedance 2.0 Mini launches on Venice, ComfyUI, and Pika MCP with 15s 720p video

ReleaseMultimodal25th June

Briefs forJune 25

Top storiesthis week

See all →

Breaking

Baidu releases Unlimited OCR with 3B params for single-pass long documents

Baidu released Unlimited OCR as an open-source long-document OCR model with 3B total parameters and 500M active at inference. Early ParseBench testing says it is strong on tables and reading order but weaker on semantic formatting and charts, giving teams a new open-weight OCR option with clear tradeoffs.

New

Multimodal·24th June·3 min read

New

Gemini 3.5 Flash adds Computer Use with 78.4 OSWorld score

Google released built-in Computer Use for Gemini 3.5 Flash across browser, mobile, and desktop. Try it for agent workflows, but watch for timeout issues on long design-from-scratch runs.

ReleaseGemini24th June

New

OpenRouter launches Image API with typed capabilities and exact USD cost

OpenRouter released a dedicated Image API that normalizes request shapes across 30-plus models from eight providers. Agents can inspect limits, passthrough options, streaming, and exact per-call cost without hardcoding vendor quirks.

ReleaseMultimodal24th June

New

Seedance 2.0 adds native 4K as fal, Replicate, Pika MCP, and ComfyUI ship support

Seedance 2.0 rolled out native 4K generation while Seedance 2.0 Mini landed on fal, Replicate, Pika MCP, and ComfyUI. That matters because engineers can now reach the same video model family through APIs, MCP workflows, and local graph tooling instead of a single web surface.

Multimodal24th June

New

Vercel AI Gateway adds GLM-5.2 Fast at 150-250 tok/s

Vercel and Wafer launched a serverless GLM-5.2 endpoint on AI Gateway with 1M context and published pricing. Teams get a high-throughput open-model option inside an existing gateway instead of managing GLM inference directly.

ReleaseGLM24th June

New

AssemblyAI launches Universal-3.5 Pro Realtime with Context Carryover

AssemblyAI’s Universal-3.5 Pro Realtime now carries forward the agent side of a conversation to improve live transcription. The release also ships multilingual realtime ASR features, and one early deployment said critical-utterance errors fell from 26% to 9%.

ReleaseVoice Agents23rd June

New

Mistral releases OCR 4 with bounding boxes and 85.20 OlmOCRBench

Mistral OCR 4 adds layout-aware extraction with bounding boxes, block typing, and inline confidence across 170 languages. Use it through the API or self-hosted deployments when document pipelines need structure, citations, redaction, and chunking.

ReleaseMistral23rd June

New

Perceptron releases Files API with reusable upload IDs

Perceptron’s Files API lets developers upload an image or video once and reference it by ID across later requests instead of resending base64 or URLs. That simplifies repeated multimodal workflows and cuts transfer overhead for video-heavy pipelines.

ReleasePersistent Storage23rd June

GLM-5.2 adds Perplexity Agent API and Droid support on Baseten at >280 TPS

GLM-5.2 added Perplexity Agent API, Droid, and more hosting options, while Baseten reported over 280 TPS and sub-0.8s TTFT. Builders should watch the cost and benchmark data as it moves into production agent stacks.

GLM22nd June

New

Google ships Interactions API in GA as Gemini default with background agents

Google put the Interactions API into GA as the new default for Gemini, adding background execution, managed agents, remote sandboxes, and multimodal tools. Builders now get one stateful interface for models, long-running jobs, and future Gemini Omni support.

ReleaseGemini22nd June

See all stories →

New

Baidu releases Unlimited OCR with 3B params for single-pass long documents

ReleaseMultimodalBenchmarks24th June · 3 min read

Gemini 3.5 Flash adds Computer Use with 78.4 OSWorld score

Google released built-in Computer Use for Gemini 3.5 Flash across browser, mobile, and desktop. Try it for agent workflows, but watch for timeout issues on long design-from-scratch runs.

ReleaseGemini24th June

OpenRouter launches Image API with typed capabilities and exact USD cost

ReleaseMultimodal24th June

Seedance 2.0 adds native 4K as fal, Replicate, Pika MCP, and ComfyUI ship support

Multimodal24th June

Daily AI Digest

Get the best stories delivered
to your inbox

Skills Spotlighttop by stars

View all skills

✍️ Writing

New

creative-ideation

Generate ideas via named methods from creative practice.

by NousResearch · 2 days ago203.5k

🎨 Design

baoyu-comic

Knowledge comics (知识漫画): educational, biography, tutorial.

by NousResearch · 1 month ago203.5k

🤖 ML/AI

comfyui

Generate images, video, and audio with ComfyUI — install, launch, manage nodes/models, run workflows with parameter injection. Uses the official comfy-cli for lifecycle and direct REST/WebSocket API for execution.

by NousResearch · 1 month ago203.5k

Explore what's new in AI

Filters

Fresh stories

Google opens Gemini 3.5 Flash Computer Use in Gemini API with explicit confirmations

OpenAI reports Codex drives 99.8% of internal AI output tokens

Seedance 2.0 Mini launches on Venice, ComfyUI, and Pika MCP with 15s 720p video

OpenAI reports Codex drives 99.8% of internal AI output tokens

Google opens Gemini 3.5 Flash Computer Use in Gemini API with explicit confirmations

Seedance 2.0 Mini launches on Venice, ComfyUI, and Pika MCP with 15s 720p video

Briefs forJune 25

Top storiesthis week

Baidu releases Unlimited OCR with 3B params for single-pass long documents

Gemini 3.5 Flash adds Computer Use with 78.4 OSWorld score

OpenRouter launches Image API with typed capabilities and exact USD cost

Seedance 2.0 adds native 4K as fal, Replicate, Pika MCP, and ComfyUI ship support

Vercel AI Gateway adds GLM-5.2 Fast at 150-250 tok/s

AssemblyAI launches Universal-3.5 Pro Realtime with Context Carryover

Mistral releases OCR 4 with bounding boxes and 85.20 OlmOCRBench

Perceptron releases Files API with reusable upload IDs

GLM-5.2 adds Perplexity Agent API and Droid support on Baseten at >280 TPS

Google ships Interactions API in GA as Gemini default with background agents

Baidu releases Unlimited OCR with 3B params for single-pass long documents

Gemini 3.5 Flash adds Computer Use with 78.4 OSWorld score

OpenRouter launches Image API with typed capabilities and exact USD cost

Seedance 2.0 adds native 4K as fal, Replicate, Pika MCP, and ComfyUI ship support

Vercel AI Gateway adds GLM-5.2 Fast at 150-250 tok/s

AssemblyAI launches Universal-3.5 Pro Realtime with Context Carryover

Mistral releases OCR 4 with bounding boxes and 85.20 OlmOCRBench

Perceptron releases Files API with reusable upload IDs

GLM-5.2 adds Perplexity Agent API and Droid support on Baseten at >280 TPS

Google ships Interactions API in GA as Gemini default with background agents

Daily AI Digest

Skills Spotlighttop by stars

creative-ideation

baoyu-comic

comfyui