# Do we consider inputs to a VAE to be samples from p(x) and outputs to be samples from p'(x|z) or are they probabilities of x?

by Einstein   Last Updated October 19, 2018 03:19 AM

In a Variational Autoencoder we say that the encoder and decoder networks model p(z | x) and p(x | z) respectively as seen in the image below:

My question is -- is the output of the decoding layer a sample from the approximate p(x|z) itself, or a set of probability values corresponding to the probability of each pixel for some decoded $$\mathbf{z}$$? If it is a set of probability values for each dimension of $$\mathbf{x}$$, how is it that we consider the input to also be a set of probability values for each pixel? Just because of the normalization between (0,1)? Could someone convince me of that if true?

Tags :

The "decoder" outputs a sample from $$p_\theta(x \mid z)$$, e.g. "reconstructed image" in your example. It doesn't output a probability at all. It's just a coincidence that it happens to output pixel values in $$[0,1]$$ in this example; they could equally be in $$[0,255]$$ and nothing about the model would change (except you'd have to use a different activation on the final layer).