Variational Autoencoders and Maximum Likelihood Estimation

In my previous blog, we explored maximum likelihood estimation (MLE) and how it can be used to derive commonly used loss functions. It also turns out that MLE is widely being used in generative models like Variational Autoencoders (VAE) and Diffusion models (DDPM). In this blog, we will explore how the loss function of Variational Autoencoders are derived. VAEs are latent variable generative models. They can solve a few tasks: Act as a generative model that mimics the data distribution that it was trained on. Approximate posterior inference of the latent variable $z$ given an observed variable $x$. In other words, it can be used to learn lower dimensional representations of the data it was trained on. Preliminary Information In this section, let us explore the tools necessary to derive a tractable form of the log-likelihood that we need to optimize. ...

March 6, 2025 · 8 min · 1669 words · Rishab Sharma

Maximum Likelihood Estimation and Loss Functions

When I started learning about loss functions, I could always understand the intuition behind them. For example, the mean squared error (MSE) for regression seemed logical—penalizing large deviations from the ground-truth makes sense. But one thing always bothered me: I could never come up with those loss functions on my own. Where did they come from? Why do we use these specific formulas and not something else? This frustration led me to dig deeper into the mathematical and probabilistic foundations of loss functions. It turns out, the answers lie in a concept called Maximum Likelihood Estimation (MLE). In this blog, I’ll take you through this journey, showing how these loss functions are not arbitrary but derive naturally from statistical principles. I’ll start by defining what Maximum Likelihood Estimation (MLE) is, followed by the intricate connection between Maximum Likelihood Estimation (MLE) and Kullback-Leibler (KL) divergence. To conclude this article, I show how loss functions like Mean Squared Error loss and Binary Cross Entropy can be derived from Maximum Likelihood estimation. ...

December 15, 2024 · 11 min · 2244 words · Rishab Sharma