Skip to content
AI Primer
TOPIC14 stories

RAG

Retrieval-augmented generation and grounded generation pipelines.

RELEASE23rd June
Mistral releases OCR 4 with bounding boxes and 85.20 OlmOCRBench

Mistral OCR 4 adds layout-aware extraction with bounding boxes, block typing, and inline confidence across 170 languages. Use it through the API or self-hosted deployments when document pipelines need structure, citations, redaction, and chunking.

NEWS1mo ago
Turbopuffer reports $100M run-rate and a 95% Cursor code-search cost cut

Turbopuffer said it crossed a $100M run-rate while staying profitable on less than $1M raised, and said Cursor moved production search onto the stack with a 95% cost reduction. The milestone matters because AI products increasingly compete on retrieval quality and cost, not just model output.

RELEASE1mo ago
Firecrawl adds Highlights to /scrape with 100x fewer tokens

Firecrawl added a Highlights mode to /scrape that returns matching text, code, or tables for a query instead of full-page payloads. The release matters because the company benchmarked the feature on 10,000 URLs against Exa Highlights and aims it at lower-token agent retrieval.

RELEASE1mo ago
Firecrawl adds Question format to /scrape with grounded answers and 100x fewer tokens

Firecrawl introduced a /scrape mode that answers a question directly from a URL instead of returning chunks for a separate retrieval loop. It targets docs and pricing pages, and teams should use it when they want grounded answers with lower token usage.

NEWS1mo ago
Gemini API adds multimodal File Search with page citations

Google expanded Gemini API File Search to index text and images together, add custom metadata filtering, and return page-level citations. RAG builders can use it for tighter retrieval control and more auditable answers.

RELEASE1mo ago
IBM releases Granite Embedding R2 with 32,768-token context and +11.8 MMTEB retrieval gain

IBM released 97M and 311M multilingual Granite Embedding R2 models under Apache 2.0, replacing XLM-RoBERTa with ModernBERT and extending context length from 512 to 32,768 tokens. The 311M model posts a +11.8 gain on MMTEB retrieval and ships with ONNX, OpenVINO, vLLM, and GGUF support.

RELEASE2mo ago
BidirLM-Omni-2.5B-Embedding launches 2048-dim text-image-audio vectors

BidirLM released a 2.5B multilingual encoder that embeds text, images, and audio into one shared 2048-dimensional space and works directly with Sentence Transformers. It tops several open-data embedding leaderboards and can run locally on GPU.

RELEASE2mo ago
LightOn releases LateOn and DenseOn at 149M params with BEIR 57.22

LightOn open-sourced DenseOn and LateOn plus the training pipeline behind them, including 1.4 billion query-document pairs and decontaminated BEIR results. Teams can use the small open retrieval models and reproduced data mixtures instead of opaque closed-data baselines.

NEWS2mo ago
OpenRouter adds Firecrawl web search with full-page markdown grounding

OpenRouter added Firecrawl as a search provider, letting models ground responses in scraped full web pages instead of snippet-only search. The launch folds crawling into the existing plugin settings flow and includes a capped free plan on the Firecrawl side.

RELEASE2mo ago
Sentence Transformers releases v5.4 with multimodal embeddings and reranking

Sentence Transformers v5.4 adds one encode API for text, image, audio, and video, plus multimodal reranking and a modular CrossEncoder stack. It also flattens Flash Attention 2 inputs for text workloads, reducing padding waste and VRAM use.

WORKFLOW2mo ago
LongTracer opens local STS+NLI claim checks for RAG validation

LongTracer open-sourced local STS+NLI claim checks, while qi published a private search engine with a Claude Code plugin and LM Studio users shared MCP search configs for Qwen. Use these stacks to ground retrieval and verify answers without a second judge model.

RELEASE3mo ago
Keep adds an in-app feed reader for saved bookmarks

Keep added an in-app feed reader so saved links can be read directly inside its bookmark store for agent workflows. Use it to turn bookmarks, RSS feeds, and markdown exports into reusable context instead of scattered tabs.

RELEASE3mo ago
Google releases Gemini Embedding 2 preview with one vector space for text, image, video, audio, and PDFs

Google launched Gemini Embedding 2 in preview, unifying multiple modalities and 100+ languages in one embedding space with flexible output dimensions. Try it to simplify cross-modal RAG and search pipelines, but compare it with late-interaction systems before committing.

RELEASE3mo ago
Gemini Embedding 2 enters preview with 8,192-token multimodal vectors and 3,072-dim outputs

Google put Gemini Embedding 2 into public preview with one vector space for text, images, video, audio, and PDFs, plus 3072, 1536, and 768 output sizes. Use it to replace multi-model retrieval pipelines with one API for RAG and cross-media search.

AI PrimerAI Primer

Your daily guide to AI tools, workflows, and creative inspiration.

© 2026 AI Primer. All rights reserved.