workflowApril 29, 2026

Curious Refuge compares GPT Image 2 and Nano Banana 2 on 4 reference-image edits

Creators ran new side-by-side tests of ChatGPT Images 2.0 and Nano Banana 2 on reference-image swaps, scene changes, and poster sketches. The split matters because GPT Image 2 held characters better, while Nano Banana stayed favored for environments, natural placement, speed, and cost.

3 min read

Curious Refuge compares GPT Image 2 and Nano Banana 2 on 4 reference-image edits

TL;DR

In CuriousRefuge's reference-edit thread, GPT Image 2.0 held onto character identity better, while Nano Banana 2 kept winning on environment consistency and more natural scene placement.
CuriousRefuge's camera-rotation test and their park-still test pushed the same split: GPT Image 2.0 looked better when the prompt asked for precise subject control, while Nano Banana 2 stayed stronger at making the new scene feel coherent.
OpenAI's GPT Image 2 model page and image-generation guide frame GPT Image 2 as a high-quality editing model with multi-turn, high-fidelity edits, while Google's Gemini help page pitches Nano Banana 2 on speed, local edits, and character consistency.
Cost is part of the verdict: CuriousRefuge called Nano Banana 2 their default because it is faster and cheaper, which lines up with OpenAI's pricing page listing GPT-image-2 output at $30 per 1M image tokens and Google's Gemini pricing page listing Gemini 3.1 Flash Image Preview output at $60 per 1M image tokens, with Google's page also showing per-image equivalents.

OpenAI's docs say GPT Image 2 is built for high-fidelity edits and multi-turn image workflows. Google's docs say Nano Banana 2 can create images in seconds, keep characters consistent, and make local edits. The fun part is that CuriousRefuge's tests landed almost exactly on that product split, while creators like AmirMushich and 0xInk_ were already turning both models into concrete design workflows.

Reference edits

CuriousRefuge ran four reference-image prompts: a character swap, a forest background change, a 180-degree camera move, and a cinematic park still. Across the set, CuriousRefuge said GPT Image 2.0 was better at character consistency, while Nano Banana 2 was better at environments and at making the subject feel less composited into the scene.

That split is easy to map onto the prompts themselves:

Character swap: the first test favored GPT Image 2.0 for keeping the subject on-model.
Background replacement: the forest edit favored Nano Banana 2 for scene continuity.
Viewpoint change: the camera-rotation prompt stressed spatial reasoning and identity preservation.
Style transfer into a new setting: the park still tested whether the model could keep grading and mood while rebuilding the background.

Speed and cost

OpenAI's model page describes GPT Image 2 as a state-of-the-art generation and editing model, and the image-generation guide highlights multi-turn editing and high-fidelity image inputs. Google's Gemini help page describes Nano Banana 2 as an image tool that works in seconds, supports local edits, and can blend multiple images while keeping character consistency.

The interesting bit is that CuriousRefuge's earlier comparison called GPT Image 2.0 the more intelligent and realistic model, but also flagged it as slow and expensive. A day later, their reference-edit follow-up still ended with Nano Banana 2 as the default pick because faster and cheaper kept mattering more than absolute precision.

Poster sketches and refinement

The workflow evidence is already getting specific. In AmirMushich's poster sketch, Nano Banana plus Figma became a typography layout exercise, with font settings and poster framing baked into the prompt instead of treated as cleanup work after the image was made.

A separate path showed up in 0xInk_'s post, where Midjourney handled the base illustration and GPT Image 2 handled refinement without breaking the original graphic style. That is a different use case from the Curious Refuge tests: not raw model-versus-model ranking, but model chaining, where one system generates the look and another tightens detail.

TL;DR

Reference edits

Speed and cost

Poster sketches and refinement

Discussion across the web