Image generation AI is software that allows anyone to easily create illustrations and images just by giving instructions in text. This time, I will summarize how illustrations and images are generated. We will also explain recommended services for image generation AI and the challenges faced by image generation AI.
What is image generation AI? Widespread background
Image generation AI is software that automatically generates images and illustrations simply by specifying the desired image using text. “
Midjourney
” released in June 2022 is said to be the beginning of the boom.
In August of the same year, ”
Stable Diffusion
” was born, further accelerating the boom. Stable Diffusion is a free tool that can be used without registration. When you access the top page and click “Get Started for Free”, the following screen will be displayed.
Enter the word that represents the image you want to generate in the “Enter your prompt” field, and the word that represents the image you want to exclude in the “negative word” field. As an example, I put “big cat” in “Enter your prompt” and “slim” in “negative word”.
Created with
Stable Diffusion
In this way, instead of a photo of a typical domestic cat, a photo of a feline that looks like a leopard or a tiger was generated.
If you access from the second time onward, you may only see “Enter your prompt” instead of “negative word”.
Next, I tried to see if I could give instructions in Japanese. When you enter “blue sky” in the “Enter your prompt” field, the following image was generated.
Created with
Stable Diffusion
Although some images are displayed that do not show blue skies, by carefully considering the instructions, you may be able to create more accurate images.
How illustrations are generated and six methods used to generate images
Some software that generates illustrations and images is called “image to image.” This is a software that allows you to write a rough sketch and create a more complete illustration.
There is also something called “text to image,” which creates illustrations and images from keywords and text. This is called “image generation AI” in Japanese, and it generates illustrations etc. just by inputting words and sentences similar to the image.
Image generation is performed using a machine learning method called deep learning. The display method differs depending on the software, but the process generally follows the steps below.
1. Text input
2. Text to vector conversion using text encoder
3. Vector to image conversion using image generator
The text encoders and image generators listed above have different mechanisms depending on the software. So let’s take a look at the six main methods used to achieve “text to image.”
1.VAE (variational autoencoder)
VAE (Variational Auto Encoder) is an image generator that uses training data to generate data similar to the training data.
A feature of VAE is that it can be incorporated into a probability distribution as a latent variable.
Typical autoencoders cannot reveal the structure of latent variables before converting input text into images.
VAE has a clear structure called probability distribution, so images created from text are displayed with higher validity.
2.GAN (Generative Adversarial Network)
GAN (Generative Adversarial Networks) is a mechanism that generates more natural images by repeatedly comparing and determining generated data with the real thing. It is also possible to quantify the characteristics shown in the text by comparing and judging it with the real thing.
By taking advantage of this feature, it is possible to automatically generate something that does not actually exist, and display it as an image that combines the authenticity of the data, the creativity, and the uniqueness of the text.
3.Pix2Pix
Pix2Pix is a mechanism that learns the relationship between two images and generates an image that reflects the relationship.
A model that predicts image generation and a model that determines the authenticity of the generated image are pitted against each other, and the conclusions drawn from the relationships between each are reflected in the image.
Pix2Pix does not convert text to vector. It uses the image itself as a condition instead of a vector, allowing image-to-image conversion.
4.TransGAN
TransGAN is a technology that creates new images by changing images in multiple stages. Display images using a simple transition process using Transformer.
Because of its simple structure, it is easy to display images with high validity. In addition, layer normalization, multi-head self-attention, and fully connected layers are used in the structural part.
5.DALL・E
DALL・E is an image generation model announced by OpenAI, which suddenly gained popularity after announcing ChatGPT.
An image with a large amount of information is compressed to 1/192 using a discrete variational autoencoder, and then restored using a decoder as an image with the same amount of information as the original quality. It learns the correspondence between the restored image and the text data originally input, and then adjusts and completes the image to create an appropriate composition.
Like TransGAN, Transformer is used for this learning process.
6.StyleGAN/StyleGAN2
StyleGAN is a mechanism that generates and transforms non-existent data based on the characteristics of training data.
GAN is an acronym for Generative Adversarial Networks, and like TransGAN, it refers to a mechanism used to generate and convert data from training data.
Unlike regular GAN, StyleGAN performs specific adjustments after convolution processing during transposition, and generates details using noise. This increases the accuracy of the image and creates a highly realistic work.
In addition, StyleGAN2, an improved version of StyleGAN, can remove minute noises that occur during the generation process and remove unnaturalness.
He is also good at creative work, such as drawing non-existent anime characters.
7 recommended image generation AI services
The number of image generation AI services available online is increasing. Here we will introduce some of the services that are easy to use.
1.Stable Diffusion
Stable Diffusion is an image generation AI that has a variety of functions. It is also easy to use because images can be generated using only simple text.
Created with
Stable Diffusion
However, since the free version is a demo screen, customization features cannot be used. It is simple to use and is recommended for those using image generation AI for the first time.
2.Generated Photos
Generated Photos is a service that can automatically generate faces that don’t actually exist. By specifying gender, age, eye color, skin color, etc., you can create faces with various facial expressions for different people.
You can also choose the background color, but please note that if you choose transparent, you will be charged a download fee.
3. Nijijourney
Niji Journey is an image generation AI mainly used when creating illustrations.
It supports not only Japanese but also Chinese, Korean, and English, allowing you to generate characters with various tastes. You can increase the size of the generated illustrations and increase variations, all with one click.
4.cre8tiveAI
cre8tiveAI provides multiple image generation services that utilize AI, including services for high image quality, services exclusively for face illustrations, and services exclusively for full-body illustrations.
For example, with the face illustration creation AI “Sai-chan”, you can create an original image by simply selecting the illustration style.
Created with
cre8tiveAI “Aya-chan”
5.Novel AI
NovelAI is a service that allows you to generate high-quality anime-like illustrations on your smartphone or low-spec computer.
There is a text creation service and an image generation service, and when you want to create an image, select “Image Generation”.
By using both text and images, you can easily create a manga.
6.Artbreeder
Artbreeder is an image generation AI service that creates things that don’t exist.
There are “Collage” and “Portrait”, and for Collage you can create a new image by entering text, and for Portrait you can create a new image by uploading an actual photo.
Created with
Artbreeder
In the text, I specified “a green cat,” but since the cat now has green eyes, I specified it in more detail, such as “a cat with green body fur,” to create a work that is closer to my image. Sho.
7.Visual ChatGPT
Visual ChatGPT is a service that uses ChatGPT to generate images interactively. It is an image generation AI that can be used for free without any restrictions and is easy to try.
Created with
Visual ChatGPT
The more detailed the description, the closer the image will be generated.
Created with
Visual ChatGPT
For writing, you can use ChatGPT. In addition, when I asked if it was possible to generate pictures with ChatGPT, I was refused.
From
ChatGPT
The following article explains what you can do with ChatGPT and how to use it. Please check it out.
Two challenges in image generation AI
Image generation AI is a service that allows you to easily generate images and illustrations using simple text. However, if you use it, you may run into the following troubles.
1. Copyright issues
2. Fake image abuse problem
It is important to check in advance for possible problems. We will introduce possible troubles for each problem, so please be careful when using it.
1.Copyright issues
Discussions often arise about who owns the images created by image generation AI. You enter the text yourself, but the text isn’t always original.
For example, many people enter simple text such as “cute cat” or “blue sky,” so it’s possible that similar or exactly the same illustrations or images will be displayed to other users.
In such cases, it is difficult to determine whose copyright it is unless there is evidence of who created it first. There is also the view that the copyright is owed to the party providing the AI that creates the image in the first place.
Regulations vary depending on the tool, so it’s important to check the details in detail, and when posting images in printed materials or on websites, it’s also important to get in the habit of always clearly indicating the name of the tool that created them.
2. Fake image abuse problem
Depending on the command you enter, ethically objectionable illustrations or fake images that mislead the truth may be generated. Since image generation AI is a new tool, the current situation is that laws and regulations have not caught up with it.
Those who use images are required to have high ethical standards.
Web survey on design AI tools: High usage rate among creators revealed
According to a web survey regarding AI tools conducted by
Tajima Design LLC
(“Screening Survey (n=5,000)” and “Main Survey (n=298)”), regarding the frequency of use of AI tools, people who use AI tools several times a month We found that the most common occupation among them is public relations/PR, and the most common occupation among those who use it every day is designer.
According to the study, data scientists are the occupations that use the most AI tools, followed by designers, public relations/PR, and marketing.
In the ranking of recognition rate and usage rate of general-purpose AI/design tools, the most popular one is naturally “ChatGPT”, followed by “Adobe Photoshop”, which ranks first in the ranking of recognition rate and usage rate of image/graphics AI tools. The most popular is “Midjourney,” which was introduced in this article, and the second place is “Stable Diffusion/ Stability AI.”
The survey was conducted from June 13th (Thursday) to June 14th (Friday), 2024, and found that 85% of people do not use AI tools on a regular basis. However, on the other hand, the younger the age group, the more they use AI tools on a regular basis, and in occupations such as public relations, data scientists, and designers, more than half use AI on a daily basis. We know that you interact with tools (use them daily, several times a week, or several times a month).
Based on the above survey results, it can be said that
the use of generative AI in creative jobs is becoming commonplace
.
For your reference, a summary of the screening survey and this survey is provided below.
Screening survey (n=5,000)
- 85% do not regularly use AI tools
- The profession with the highest percentage of using AI tools every day is “Designer”
- More than 20% of “designers” use AI every day
- 7% of people pay for AI tools
- People in their 20s had the highest paid billing rate, and the highest percentage answered “20,000 yen or more.”
Main survey (n=298)
- Among general-purpose AI tools, “ChatGPT” has the highest recognition rate (56%)
- “ChatGPT” has the highest usage rate among general-purpose AI tools (48.7%)
- “Midjouney” (30.2%) has the highest recognition rate for images/graphics.
- “Midjouney” has the highest usage rate for images/graphics (24.2%)
- “Gamma” (36.6%) has the highest recognition rate in the data category
- “Gamma” (31.9%) has the highest usage rate in the data category.
- “Create.xyz” has the highest recognition rate in the web series (25.2%)
- The most frequently used web-based service is “tl;draw” (22.8%).
- “DomoAI” has the highest recognition rate for video (27.2%)
- “DomoAI” has the highest usage rate for video (23.8%)
- 40.3% intend to increase the amount charged for AI tools, and 25.8% intend to decrease it.
- Approximately half of those currently using the free service intend to pay.
Understand the characteristics and issues and utilize image generation AI
By using image generation AI, even ordinary people who are not illustrators or photographers can easily create original illustrations and images. However, the copyright of the generated images is debatable, so please be careful when using them.
In addition, it is problematic to depict specific individuals or brands, or to use fake images that lead to misunderstanding of facts. Please use image creation AI with high ethical standards.