KL Divergence

When I started learning about loss functions, I could always understand the intuition behind them. For example, the mean squared error (MSE) for regression seemed logical—penalizing large deviations from the ground-truth makes sense. But one thing always bothered me: I could never come up with those loss functions on my own. Where did they come from? Why do we use these specific formulas and not something else? This frustration led me to dig deeper into the mathematical and probabilistic foundations of loss functions....

KL Divergence

Variational Autoencoders and Maximum Likelihood Estimation

Maximum Likelihood Estimation and Loss Functions