Models, Serving & APIs — Explore AI Tools & Stories

Fresh stories

Codex app reportedly leaks GPT-5.6 Sol, Terra, and Luna model names

Codex app code now references GPT-5.6 Sol, Terra, and Luna, while posts claim Sol Ultra reaches 91.9% on TerminalBench at lower cost. Treat release timing, limits, and benchmark claims as unofficial until OpenAI publishes details.

🧠Codex3rd July

GLM-5.2 benchmarks at 97.6% tool-calling and 2,626 tok/s on MI355X

Kilo, Composio, Together, and Wafer posted GLM-5.2 measurements including 40/41 tool tasks, 7/10 code review, and 2,626 tok/s on MI355X. Try it for lower-cost coding and tool use, but validate cross-file reasoning and latency on your workload.

🧠GLM3rd July·6 min read

Gemini Omni Flash ranks #1 on Video Arena with 1404 Elo

Gemini Omni Flash ranked #1 on Video Arena at 1404 Elo, 101 points above Seedance 2.0 Mini, and ComfyUI posted a text-prompt video-edit workflow. Google noted the leaderboard is third-party, leaving benchmark provenance as the main caveat.

🧠Gemini3rd July

New

GLM 5.2 supports Amp, dcode, and Next.js workflows after Composio tops 41 tool tasks

Independent toolmakers pushed GLM 5.2 into coding workflows via dcode, Amp plugin modes, and Wafer-backed Next.js routes, while Composio reported it tied or won across 41 real-tool tasks. That matters because GLM is moving from benchmark curiosity into a practical open-weight option for agentic coding and long-running repo work.

🧠GLM1st July

Breaking

Z.ai launches ZCode with GLM-5.2, BYOK, and 1.5x Coding Plan quota

Z.ai released ZCode as its official desktop environment for GLM-5.2, with multi-agent project work, long-running tasks, code review, and clients for macOS, Windows, and Linux. GLM Coding Plan subscribers get a 1.5x quota inside ZCode, while other developers can bring existing subscriptions or API keys.

New

GLM·1st July·3 min read

New

Google releases Nano Banana 2 Lite and Gemini Omni Flash

Google shipped Nano Banana 2 Lite for image generation and Gemini Omni Flash for conversational video generation and editing in the Gemini API and AI Studio. The release sets image generation at about 4 seconds and $0.034 per 1K image, while Omni Flash adds multi-turn video edits at $0.10 per second.

Release🧠Gemini30th June

New

The Information reports OpenAI cuts inference costs by more than 50% on some models

Multiple summaries of The Information report said OpenAI found inference optimizations that more than halved costs on some existing models. If that holds, it changes the margin, pricing, and usage-limit math behind ChatGPT and API serving even before new model releases arrive.

💳Model serving30th June

Breaking

Cognition launches Devin Fusion with mid-session routing and 35% lower Fable-class cost

Cognition launched Devin Fusion, a hybrid coding harness that reroutes work mid-task and says it cuts Fable-class cost by 35%. Use it when upfront routing misses late complexity; the router can re-evaluate after investigation starts.

New

Model Routing·29th June·5 min read

New

Meituan releases LongCat 2.0: 1.6T MoE on domestic chips

Meituan disclosed LongCat 2.0, a 1.6T-parameter MoE with about 48B active parameters, 1M context, and 35T training tokens on domestic hardware. The release ties a near-frontier open model to a Chinese domestic compute stack and a custom sparse-attention design.

Release🧠LongCat29th June

New

Snowflake releases Arctic RL with ZoRRo: Text2SQL-R2 training drops to ~36 hours

Snowflake open-sourced Arctic RL and said its ZoRRo optimization delivers up to 6x actor-update speedup and 3.5x end-to-end gains. The repo packages those gains into VeRL and SkyRL integrations plus open Text2SQL and multi-hop QA recipes.

Release🧠GPU Infrastructure29th June

See all stories →

GLM 5.2 supports Amp, dcode, and Next.js workflows after Composio tops 41 tool tasks

🧠GLM1st July

Z.ai launches ZCode with GLM-5.2, BYOK, and 1.5x Coding Plan quota

Release🧠GLM1st July

Google releases Nano Banana 2 Lite and Gemini Omni Flash

Release🧠Gemini30th June

The Information reports OpenAI cuts inference costs by more than 50% on some models

💳Model serving30th June

Cognition launches Devin Fusion with mid-session routing and 35% lower Fable-class cost

Release🧠Model Routing29th June

Meituan releases LongCat 2.0: 1.6T MoE on domestic chips

Release🧠LongCat29th June

Snowflake releases Arctic RL with ZoRRo: Text2SQL-R2 training drops to ~36 hours

Release🧠GPU Infrastructure29th June

Briefs forJuly 3

Top storiesthis week

See all →

Breaking

Vercel raises Functions package limit to 5 GB on Fluid compute

Vercel raised the maximum package size for Functions on Fluid compute from 250 MB to 5 GB, a 20x increase. The change removes a common deployment blocker for browser automation, larger Python AI stacks, image processing, and heavier backend workloads.

New

Capacity Planning·29th June·3 min read

New

Google limits Meta's Gemini use after capacity shortages

The FT reported that Google capped Meta's Gemini usage after Meta asked for more model capacity than Google could supply, affecting internal safety, support, ad, and coding projects. The restriction matters because model access is now constrained by chip, memory, and networking capacity as much as by API contracts.

🧠Gemini28th June

DeepSeek releases DSpark checkpoints for Qwen3 and Gemma-4

DeepSeek extended DSpark beyond V4 by publishing draft-model checkpoints for Qwen3 and Gemma-4 families and clarifying that DSpark targets higher-throughput serving by controlling verification cost. The release matters because speculative decoding is moving from papers into reusable open checkpoints.

Release🧠Gemma28th June

New

xAI tests Grok 4.5 private beta on a 1.5T V9 model with Cursor data

Multiple trackers said Grok 4.5 is in private beta at SpaceX and Tesla, built on a 1.5T V9 base with supplemental Cursor data and compared internally against an unspecified Opus model. The claims matter because xAI is signaling a faster release cadence, but the reported performance is still unverified.

🧠Grok28th June

See all stories →

New

Vercel raises Functions package limit to 5 GB on Fluid compute

🧠Capacity PlanningDeveloper tools29th June · 3 min read

Google limits Meta's Gemini use after capacity shortages

🧠Gemini28th June

DeepSeek releases DSpark checkpoints for Qwen3 and Gemma-4

Release🧠Gemma28th June

xAI tests Grok 4.5 private beta on a 1.5T V9 model with Cursor data

🧠Grok28th June

Daily AI Digest

Get the best stories delivered
to your inbox

Explore what's new in AI

Filters

Fresh stories

Codex app reportedly leaks GPT-5.6 Sol, Terra, and Luna model names

GLM-5.2 benchmarks at 97.6% tool-calling and 2,626 tok/s on MI355X

Gemini Omni Flash ranks #1 on Video Arena with 1404 Elo

GLM 5.2 supports Amp, dcode, and Next.js workflows after Composio tops 41 tool tasks

Z.ai launches ZCode with GLM-5.2, BYOK, and 1.5x Coding Plan quota

Google releases Nano Banana 2 Lite and Gemini Omni Flash

The Information reports OpenAI cuts inference costs by more than 50% on some models

Cognition launches Devin Fusion with mid-session routing and 35% lower Fable-class cost

Meituan releases LongCat 2.0: 1.6T MoE on domestic chips

Snowflake releases Arctic RL with ZoRRo: Text2SQL-R2 training drops to ~36 hours

Codex app reportedly leaks GPT-5.6 Sol, Terra, and Luna model names

Gemini Omni Flash ranks #1 on Video Arena with 1404 Elo

GLM-5.2 benchmarks at 97.6% tool-calling and 2,626 tok/s on MI355X

GLM 5.2 supports Amp, dcode, and Next.js workflows after Composio tops 41 tool tasks

Z.ai launches ZCode with GLM-5.2, BYOK, and 1.5x Coding Plan quota

Google releases Nano Banana 2 Lite and Gemini Omni Flash

The Information reports OpenAI cuts inference costs by more than 50% on some models

Cognition launches Devin Fusion with mid-session routing and 35% lower Fable-class cost

Meituan releases LongCat 2.0: 1.6T MoE on domestic chips

Snowflake releases Arctic RL with ZoRRo: Text2SQL-R2 training drops to ~36 hours

Briefs forJuly 3

Top storiesthis week

Vercel raises Functions package limit to 5 GB on Fluid compute

Google limits Meta's Gemini use after capacity shortages

DeepSeek releases DSpark checkpoints for Qwen3 and Gemma-4

xAI tests Grok 4.5 private beta on a 1.5T V9 model with Cursor data

Vercel raises Functions package limit to 5 GB on Fluid compute

Google limits Meta's Gemini use after capacity shortages

DeepSeek releases DSpark checkpoints for Qwen3 and Gemma-4

xAI tests Grok 4.5 private beta on a 1.5T V9 model with Cursor data

Daily AI Digest