|

Stable Diffusion VAE: What Is VAE & How It Works?

A lot goes into generating beautiful and stunning images in Stable Diffusion. You may already be familiar with checkpoint models that are crucial for generating images in Stable Diffusion. 

But there’s also one piece of the puzzle many users are not aware of. It’s called VAE and it can change how your output images look in Stable Diffusion. 

You might have already used VAE in Stable Diffusion when generating images. But have you ever wondered what is Stable Diffusion VAE and how does it work? 

Well, that’s what I’m going to answer in this guide along with many other questions about Stable Diffusion VAE. 

So, let’s get started. 

What is VAE in Stable Diffusion

VAE stands for Variable Auto Encoder which is part of the neural network model in Stable Diffusion. It is responsible for encoding and decoding images from latent space to pixel space. 

In Stable Diffusion, images are generated in latent space and then converted into a higher-quality image with the help of VAE. By using VAE, your images will come out looking more sharper and crispier. 

Stability AI, the parent company behind Stable Diffusion has released two VAE models you can use in Stable Diffusion. 

These are the EMA (Exponential Moving Average) and MSE (Mean Squared Error). The EMA VAE produces sharper and realistic images whereas the MSE VAE produces images that are smooth and less noisy. 

If all this is too technical for you, all you need to know is that you can use either of these VAE models in Stable Diffusion 1.5 and 2.0 models. 

Do I need to use VAE?

While you can generate images in Stable Diffusion without any VAE, it’s highly recommended to use VAE as it produces better output images. 

Without any VAE, your images will often come out looking desaturated and washed out. Here’s an example of three images generated with and without VAE.  

Stable Diffusion VAE
ft-EMA (left), ft-MSE (middle), original (right)

As you can see from the results above, the image without VAE looks a bit desaturated and less detailed especially the eyes, whereas the image with VAE looks a lot better in comparison. 

Apart from producing sharper and more realistic images in Stable Diffusion, there is another reason why it’s important to use VAE. 

Many checkpoint models rely on VAE to decode the image from latent space to pixel space. Without the VAE, the model fails to reconstruct the image from the latent space to pixel space. This leads to images looking blurry, noisy, and full of artifacts.

All in all, it’s a good practice to use VAE in Stable Diffusion as your output images will come out looking much better. 

However, that’s not always the case as many checkpoint models have a baked VAE which means the VAE is included within the checkpoint models itself. 

Stable Diffusion VAE

You’ll find many checkpoint models on Civitai that say Baked VAE for which you don’t need to use any VAE. The models that require VAE usually mention the recommended VAE model to use. 

How to use VAE in Stable Diffusion

Using VAE in Stable Diffusion is a very simple process. First, you need to download the VAE model. Stability AI has two VAE models that you can use. 

Here are the download links for both the models: 

You can download either one of them. 

Once downloaded, place the downloaded .safetensor file of the model in the following directory: 

stable-diffusion-webui/models/VAE

Stable Diffusion VAE - VAE Directory

This is the location where you place any VAE model you download. 

To load the VAE model in Stable Diffusion, go to the settings tab in Automatic1111 and select the VAE section. From here, choose the VAE model from the dropdown in the SD VAE section. 

Stable Diffusion VAE - Load VAE

Click on the Apply settings button and your VAE model will be loaded. Now, whenever you generate images in Stable Diffusion, your selected VAE model will be used. You don’t have to apply your VAE every single time. 

In case you’re using ComfyUI, you can choose the VAE model by using the VAE node. You can check out my ComfyUI guide to learn more about it. 

Once your VAE is loaded in Automatic1111 or ComfyUI, you can now start generating images using VAE. 

FAQs

Here are some frequently asked questions about Stable Diffusion VAE: 

Do I need a separate VAE for SDXL? 

Yes, you’ll have to use the SDXL VAE model which is recommended by Stability AI. This VAE is specifically trained for all SDXL models and can be downloaded from here.

Do all checkpoint models require VAE?

Not all checkpoint models require VAE as some have baked VAE in them. Whenever you download checkpoint models from Civitai, make sure to check the description to see if the model requires a VAE or not. 

Why do my images come out looking bad when I use VAE?

If your images come out looking bad and noisy when using VAE, it’s most likely that you’re using the wrong VAE with the checkpoint model. Check the checkpoint model’s description to see what VAE they recommend. 

Conclusion

VAE is often a mysterious concept and many Stable Diffusion users lack an understanding of what it is and how it works. 

With this guide, I hope you learned more about what is VAE and how you should use it. Using VAE in Stable Diffusion can completely change how your output images look. 

That’s why it’s not only important to use VAE but also to use the right VAE with your checkpoint model. 

If you have any questions about VAE, feel free to drop them in the comments section below. 

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.