Skip to content
AI Primer
TOPIC12 stories

RAG

Retrieval-augmented generation and grounded generation pipelines.

RELEASE8th May
Firecrawl adds Highlights to /scrape with 100x fewer tokens

Firecrawl added a Highlights mode to /scrape that returns matching text, code, or tables for a query instead of full-page payloads. The release matters because the company benchmarked the feature on 10,000 URLs against Exa Highlights and aims it at lower-token agent retrieval.

RELEASE6th May
Firecrawl adds Question format to /scrape with grounded answers and 100x fewer tokens

Firecrawl introduced a /scrape mode that answers a question directly from a URL instead of returning chunks for a separate retrieval loop. It targets docs and pricing pages, and teams should use it when they want grounded answers with lower token usage.

NEWS1w ago
Gemini API adds multimodal File Search with page citations

Google expanded Gemini API File Search to index text and images together, add custom metadata filtering, and return page-level citations. RAG builders can use it for tighter retrieval control and more auditable answers.

RELEASE1w ago
IBM releases Granite Embedding R2 with 32,768-token context and +11.8 MMTEB retrieval gain

IBM released 97M and 311M multilingual Granite Embedding R2 models under Apache 2.0, replacing XLM-RoBERTa with ModernBERT and extending context length from 512 to 32,768 tokens. The 311M model posts a +11.8 gain on MMTEB retrieval and ships with ONNX, OpenVINO, vLLM, and GGUF support.

RELEASE2w ago
BidirLM-Omni-2.5B-Embedding launches 2048-dim text-image-audio vectors

BidirLM released a 2.5B multilingual encoder that embeds text, images, and audio into one shared 2048-dimensional space and works directly with Sentence Transformers. It tops several open-data embedding leaderboards and can run locally on GPU.

RELEASE3w ago
LightOn releases LateOn and DenseOn at 149M params with BEIR 57.22

LightOn open-sourced DenseOn and LateOn plus the training pipeline behind them, including 1.4 billion query-document pairs and decontaminated BEIR results. Teams can use the small open retrieval models and reproduced data mixtures instead of opaque closed-data baselines.

NEWS3w ago
OpenRouter adds Firecrawl web search with full-page markdown grounding

OpenRouter added Firecrawl as a search provider, letting models ground responses in scraped full web pages instead of snippet-only search. The launch folds crawling into the existing plugin settings flow and includes a capped free plan on the Firecrawl side.

RELEASE4w ago
Sentence Transformers releases v5.4 with multimodal embeddings and reranking

Sentence Transformers v5.4 adds one encode API for text, image, audio, and video, plus multimodal reranking and a modular CrossEncoder stack. It also flattens Flash Attention 2 inputs for text workloads, reducing padding waste and VRAM use.

WORKFLOW1mo ago
LongTracer opens local STS+NLI claim checks for RAG validation

LongTracer open-sourced local STS+NLI claim checks, while qi published a private search engine with a Claude Code plugin and LM Studio users shared MCP search configs for Qwen. Use these stacks to ground retrieval and verify answers without a second judge model.

RELEASE2mo ago
Keep adds an in-app feed reader for saved bookmarks

Keep added an in-app feed reader so saved links can be read directly inside its bookmark store for agent workflows. Use it to turn bookmarks, RSS feeds, and markdown exports into reusable context instead of scattered tabs.

RELEASE2mo ago
Google releases Gemini Embedding 2 preview with one vector space for text, image, video, audio, and PDFs

Google launched Gemini Embedding 2 in preview, unifying multiple modalities and 100+ languages in one embedding space with flexible output dimensions. Try it to simplify cross-modal RAG and search pipelines, but compare it with late-interaction systems before committing.

RELEASE2mo ago
Gemini Embedding 2 enters preview with 8,192-token multimodal vectors and 3,072-dim outputs

Google put Gemini Embedding 2 into public preview with one vector space for text, images, video, audio, and PDFs, plus 3072, 1536, and 768 output sizes. Use it to replace multi-model retrieval pipelines with one API for RAG and cross-media search.

AI PrimerAI Primer

Your daily guide to AI tools, workflows, and creative inspiration.

© 2026 AI Primer. All rights reserved.