Gemini API adds multimodal File Search with page citations
Google expanded Gemini API File Search to index text and images together, add custom metadata filtering, and return page-level citations. RAG builders can use it for tighter retrieval control and more auditable answers.

TL;DR
- OfficialLoganK's launch post said Gemini API File Search now supports multimodal retrieval through Gemini Embedding 2, plus custom metadata and inline citations.
- According to DynamicWebPaige's example, the new retrieval path can match a query like "cat" across images, audio, books, SVGs, and multilingual text, although Google's own developer guide currently says audio and video uploads are not yet supported.
- The launch post and the File Search docs both frame the update around verifiable RAG, with page-level citations and metadata filters for narrowing the search set.
- The documentation adds two practical details missing from most social posts: query-time embedding generation and storage are free, while imported search-store data persists until you delete it or the model is deprecated.
You can read Google's announcement, jump straight to the File Search docs, and skim Google's longer developer guide for the code path that creates a multimodal store, uploads images, and downloads cited media. ai_for_success's screenshot also shows the three shipped features in one place.
Multimodal retrieval
The headline change is that File Search can now index images and text in the same store when you create it with Gemini Embedding 2. Google's developer guide says that also lets PDFs carry native image embeddings alongside text, instead of treating visual content as an OCR side effect.
That changes the retrieval unit from "document text" to mixed evidence. DynamicWebPaige's example described one query pulling back books, images, SVGs, and other cat-related material from the same archive.
Metadata filters
The second addition is custom metadata on files. The docs describe it as key-value pairs you can attach at upload time, then filter against with metadata_filter so one store can still be sliced by attributes like author, year, department, or status.
That is a small but useful shift for managed RAG. Instead of splitting every corpus into separate stores, developers can keep one index and constrain retrieval at query time.
Page-level citations
Google's announcement pitches citations as the verifiability layer, and the developer guide gets more concrete: text citations can return exact page numbers, while multimodal stores can also return a media_id for cited images that apps can download and display.
That gives File Search a cleaner answer path for user-facing RAG apps. The model output is no longer just grounded in a file, it can point to a page or to the specific retrieved image.
Costs and retention
The File Search docs bury the most operational detail: storage and query-time embedding generation are free, billing only starts when files are first indexed and when the Gemini model consumes input and output tokens.
The same docs say File Search stores have no TTL. Imported embeddings persist until manual deletion or model deprecation. If you use uploadToFileSearchStore, the temporary raw File object is deleted after 48 hours, but the indexed data remains in the search store.