It’s May 25, 2022. Two days ago

on May 23, 2022, Monday at 17:42:53 UTC Google Brain published their text-to-image diffusion model implementation: Imagen.

Imagen is comparable to Dall-E (from OpenAI) which was initially launched on January 5, 2021 and then greatly improved on via Dall-E 2 on April 6, 2022. Creatively, Dall-E derives its name from a combination of Salvador Dali and Wall-E.

Dall-e 2 had great improvements on Dall-E and the image outcomes were just much more photorealistic and impressive.

Dall-E vs Dall-E 2 results

Imagen from Google Brain

Now, almost a year later, we have Google’s Imagen published which makes Dall-e 2 results look primitive. Imagen’s image outcomes are so impressive that, it looks beyond the artistic capabilities of human designers and illustrators. Of course art and design are subjective terms but every single Imagen image published is truly mind-blowingly accurate, well-designed and photorealistic.

Samples produced by Imagen have higher image quality and better text-image alignment.

I think it’s a tie since OpenAI pioneered this implementation and Google Brain took even Dall-E 2’s achievement to a complete next level. Scoreboard shows OpenAI 1 – 1 Google.

Both Imagen and Dall-E 2 have specific AI characteristics. They are,

  • Generative: Generative vs Discriminative Machine Learning Models.
  • Transformative: Transformative models contribute to the outcomes with nuances and perspectives similar to the humans such as generating a visually pleasant and photorealistic image rather than just combining values. [1]
  • Diffusion Models: Diffusion models denoise images. For example a low resolution or pixelated area can be improved with diffusion models. The image with noise (particularly Gaussian noise) is converted to higher quality samples hence the pixels (and noise) are diffused resulting in a higher quality image. Diffusion models yield great quality results and are computationally more efficient than alternative methods such as autoregressive models. UC Berkeley’s Ho et al. have a fantastic research paper on denoising with diffusion models.
Youtuber Marques Brownlee has a pretty informative and entertaining video on Dall-E 2 posted in May 2022. It’s great and shows a sneak peak to some unofficial text-to-image experiments since Marques was granted permission by OpenAI to make some tests with the model. There aren’t even many videos on Imagen yet but expect crazy amount of discussions and content on Imagen in the next couple of years.
 
Imagen’s academic paper is also a gem for Artifical Intelligence enthusiasts.

OpenAI states “We recognize that work involving generative models has the potential for significant, broad societal impacts. “

Google Imagen Samples: Images created based on text descriptions

Digital Image Basics

Pretty much all of these incredibly exciting advancements are based on digital image fundamentals. If it seems too confusing to you, you can take a look into how images are represented by numbers at each pixel level and how they can be represented and manipulated numerically through these tutorials:

Once you are able to see images as matrices of numerical values (usually a combination of RBG, red, blue and green and sometimes RGBA with alpha transparency value) covering the resolution array of the image, your whole perspective about how AI and ML can be implemented through computer vision models changes.

Societal Impacts

It’s starting to feel like we are there. AI’s real-world successes are being increasingly felt. Every new applied-AI milestone leaves your mouth open. It’s hard not to imagine the societal impacts. Millions of humans who derive lots of self-worth and satisfaction from their work suddenly becoming unemployed or even worse irrelevant.

At this rate of AI innovation, it’s not hard to imagine the next decade being truly disrupted by AI implementations. By 2032 we would probably have an army of trained AI algorithms that can walk, talk, drive, write better than humans. And the way AI works is, we are not talking about slightly better, they will be crushingly better and the gap will exponentially widen.

Here comes AI and the world is definitely not ready..

Economically, emotionally, physically, socially nor theologically.

References

[1] Forecasting Transformative AI: An Expert Survey: https://arxiv.org/ftp/arxiv/papers/1901/1901.08579.pdf

Recommended Posts