GPT Image 2 vs the rest: where it wins, where it doesn't

I run a small studio that ships a lot of visual work — landing pages, deck illustrations, the occasional editorial piece. We pay for four image models right now. I have been using all of them for about a month, often on the same brief, just to see what each one is actually good for. This is not a leaderboard. It is what I would tell a friend over coffee.

What GPT Image 2 wins

Text rendering. Not even close. If your image has words on it — a poster, a book cover, a t-shirt mockup, a sign — GPT Image 2 is the first thing I reach for. The other models are still doing the "almost spells it" thing. This one mostly just spells it. I still proof every output, but I almost never have to fix the letters anymore.

Edits that respect what you already had. If I take a generated image and ask the model to change one thing — make the jacket blue instead of red, add a coffee cup on the table — it does the change without redrawing the whole world. Other models tend to give me a brand new image that happens to share a vibe. This is the feature I underestimated the most before I started using it for real work.

Multi-reference composition. Drop in three reference images, tell it which one to take color from, which one to take pose from, which one to take lighting from, and it will mostly listen. I have not had this work as cleanly elsewhere. It is not perfect, but it is usable.

What it doesn't win

Pure photographic realism. If I want a hyper-real product shot of a watch with believable specular highlights, I am not using GPT Image 2 first. The other photo-focused models still edge it out for me on raw photoreal quality. GPT Image 2's images look designed, which is great when you want that. Less great when you want them to look like nobody designed them.

Stylized illustration with a specific artist's voice. I have not been able to get this model to do, say, "watercolor in the style of a specific Eastern European children's book illustrator from the 70s" as well as a finetuned community model can. If you live in a particular illustration niche, your favorite finetune is probably still better.

Speed for one-offs. The other models I use are sometimes a touch faster per image. With GPT Image 2 you tend to want to iterate more carefully because each image carries more "design intent." This is a feature, not a bug, but if you just want to throw fifty quick variations at a wall, it is not the cheapest model in time terms.

What I do now

For about 70% of the studio's work, GPT Image 2 is the first stop. Anything with text. Anything where the client cares about brand consistency. Anything that needs an edit pass. The other models stick around for specific jobs — photoreal, niche illustration, quick exploration.

The honest answer is that there is no winner, and anyone telling you there is one is selling you something. Different models have different strengths. The skill is knowing which to reach for.

The thing nobody tells you

The model matters less than you think. The biggest predictor of good output, by a long way, is the time you spend on the prompt and the references. I have seen junior designers get worse results with the "best" model than senior designers get with the "worst" one. Do not chase the new release. Get good with one tool first.

That said — if I had to pick one tool today for a working studio, knowing what the work actually looks like — I would pick GPT Image 2. It is the first model that has felt like a design tool instead of a slot machine.