Do we consider inputs to a VAE to be samples from p(x) and outputs to be samples from p'(x|z) or are they probabilities of x?

by Einstein   Last Updated October 19, 2018 03:19 AM

In a Variational Autoencoder we say that the encoder and decoder networks model p(z | x) and p(x | z) respectively as seen in the image below: VAE

My question is -- is the output of the decoding layer a sample from the approximate p(x|z) itself, or a set of probability values corresponding to the probability of each pixel for some decoded $\mathbf{z}$? If it is a set of probability values for each dimension of $\mathbf{x}$, how is it that we consider the input to also be a set of probability values for each pixel? Just because of the normalization between (0,1)? Could someone convince me of that if true?

Answers 1

The "decoder" outputs a sample from $p_\theta(x \mid z)$, e.g. "reconstructed image" in your example. It doesn't output a probability at all. It's just a coincidence that it happens to output pixel values in $[0,1]$ in this example; they could equally be in $[0,255]$ and nothing about the model would change (except you'd have to use a different activation on the final layer).

October 19, 2018 02:44 AM

Related Questions

What are the downsides of bayesian neural networks?

Updated October 31, 2017 15:19 PM

Variational inference: how to rewrite ELBO?

Updated February 12, 2018 16:19 PM