Tech & AI

Google Gemini’s AI image model gets a ‘bananas’ upgrade


Google is upgrading its Gemini chatbot with a new AI image model that gives users finer control over editing photos, a step meant to catch up with OpenAI’s popular image tools and draw users from ChatGPT.

The update, called Gemini 2.5 Flash Image, rolls out starting Tuesday to all users in the Gemini app, as well as to developers via the Gemini API, Google AI Studio, and Vertex AI platforms.

Gemini’s new AI image model is designed to make more precise edits to images — based on natural language requests from users — while preserving the consistency of faces, animals, and other details, something that most rival tools struggle with. For instance, ask ChatGPT or xAI’s Grok to change the color of someone’s shirt in a photo, and the result might include a distorted face or an altered background.

an animated GIF showing two pictures, one of an athlete and the other of a dog, in a new combined photo of the athlete cuddling the dog.
Gemini 2.5 Flash’s native image editor blends photos of a dog and person, while keeping their likeness. Credit: Google

Google’s new tool has already drawn attention. In recent weeks, social media users raved over an impressive AI image editor in the crowdsourced evaluation platform, LMArena. The model appeared to users anonymously under the pseudonym “nano-banana.”

Google says it’s behind the model (if it wasn’t obvious already from all the banana-related hints), which is really the native image capability within its flagship Gemini 2.5 Flash AI model. Google says the image model is state-of-the-art on LMArena and other benchmarks.

a graph showing imaging editing benchmarks, with Gemini 2.5 Flash Image / LMArena performing better than other rival models.
Google claims its new AI image model is state-of-the-art on several benchmarks. CREDIT: GOOGLE

“We’re really pushing visual quality forward, as well as the model’s ability to follow instructions,” said Nicole Brichtova, a product lead on visual generation models at Google DeepMind, in an interview with TechCrunch.

“This update does a much better job making edits more seamlessly, and the models outputs are usable for whatever you want to use them for,” said Brichtova.

AI image models have become a critical battle ground for Big Tech. When OpenAI launched GPT-4o’s native image generator in March, it drove ChatGPT’s usage through the roof thanks to a frenzy of AI-generated Studio Ghibli memes that, according to OpenAI CEO Sam Altman, left the company’s GPUs “melting.”

To keep up with OpenAI and Google, Meta announced last week that it would license AI image models from the startup Midjourney. Meanwhile, the a16z-backed German unicorn Black Forest Labs continues to dominate benchmarks with its FLUX AI image models.

Perhaps Gemini’s impressive AI image editor can help Google close its user gap with OpenAI. ChatGPT now logs more than 700 million weekly users. On Google’s earnings call in July, the tech giant’s CEO Sundar Pichai revealed that Gemini had 450 million monthly users — implying weekly users are even lower.

Brichtova says Google specifically designed the image model with consumer use cases in-mind, such as helping users visualize their home and garden projects. The model also has better “world knowledge” and can combine multiple references in a single prompt; for example, merging an image of a sofa, a living room photo, and a color palette into one cohesive render.

an animated GIF showing an image of an empty living room, with prompts displayed on screen such as "add paint" — and the room paint changes color. "Add sofa," and a sofa is added. The demo shows the AI prompts changing the image in real-time.
Gemini 2.5 Flash Image lets users have “multi-turn” conversations with an AI image model. CREDIT: GOOGLE

While Gemini’s new AI image generator makes it easier for users to make and edit realistic images, the company has safeguards that limit what users can create. Google has struggled with AI image generator safeguards in the past. At one point, the company apologized for Gemini generating historically inaccurate pictures of people, and rolled back the AI image generator altogether.

Now, Google feels that it’s struck a better balance.

“We want to give users creative control so that they can get from the models what they want,” said Brichtova. “But it’s not like anything goes.”

The generative AI section of Google’s terms of service prohibits users from generating “non-consensual intimate imagery.” Those same kinds of safeguards don’t seem to exist for Grok, which allowed users to create AI-generated explicit images resembling celebrities, such as Taylor Swift.

To address the rise of deepfake imagery, which can make it hard for users to discern what’s real online, Brichtova says that Google applies visual watermarks to AI-generated images, as well as identifiers in its metadata. However, someone scrolling past an image on social media may not look for such identifiers.



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *