WELL, YES: Sergey Brin says Google ‘definitely messed up’ Gemini image launch.
Brin, 50, spoke to entrepreneurs on Saturday at the “AGI House” in Hillsborough, California, just south of San Francisco, where developers and founders were testing Google’s Gemini model. AGI stands for artificial general intelligence and refers to a form of AI that can complete tasks to the same level, or a step above, humans.
In taking questions from the crowd, Brin discussed AI’s impact on search and how Google can maintain its leadership position in its core market as AI continues to grow. He also commented on the flawed launch last month of Google’s image generator, which the company pulled after users discovered historical inaccuracies and questionable responses.
“We definitely messed up on the image generation,” Brin said Saturday. “I think it was mostly due to just not thorough testing. It definitely, for good reasons, upset a lot of people.”
Except that isn’t the case at all:
Roughly, the “safety” architecture designed around image generation (slightly different than text) looks like this: a user makes a request for an image in the chat interface, which Gemini — once it realizes it’s being asked for a picture — sends on to a smaller LLM that exists specifically for rewriting prompts in keeping with the company’s thorough “diversity” mandates. This smaller LLM is trained with LoRA on synthetic data generated by another (third) LLM that uses Google’s full, pages-long diversity “preamble.” The second LLM then rephrases the question (say, “show me an auto mechanic” becomes “show me an Asian auto mechanic in overalls laughing, an African American female auto mechanic holding a wrench, a Native American auto mechanic with a hard hat” etc.), and sends it on to the diffusion model. The diffusion model checks to make sure the prompts don’t violate standard safety policy (things like self-harm, anything with children, images of real people), generates the images, checks the images again for violations of safety policy, and returns them to the user.
“Three entire models all kind of designed for adding diversity,” I asked one person close to the safety architecture. “It seems like that — diversity — is a huge, maybe even central part of the product. Like, in a way it is the product?”
“Yes,” he said, “we spend probably half of our engineering hours on this.”
Either Brin doesn’t know what’s going on in his own company or he can’t admit it in public. Neither should be comforting to jittery shareholders.