by Einstein
Last Updated October 19, 2018 03:19 AM

In a Variational Autoencoder we say that the encoder and decoder networks model p(z | x) and p(x | z) respectively as seen in the image below:

My question is -- is the output of the decoding layer a sample from the approximate p(x|z) itself, or a set of probability values corresponding to the probability of each pixel for some decoded $\mathbf{z}$? If it is a set of probability values for each dimension of $\mathbf{x}$, how is it that we consider the input to also be a set of probability values for each pixel? Just because of the normalization between (0,1)? Could someone convince me of that if true?

The "decoder" outputs a sample from $p_\theta(x \mid z)$, e.g. "reconstructed image" in your example. It doesn't output a probability at all. It's just a coincidence that it happens to output pixel values in $[0,1]$ in this example; they could equally be in $[0,255]$ and nothing about the model would change (except you'd have to use a different activation on the final layer).

- ServerfaultXchanger
- SuperuserXchanger
- UbuntuXchanger
- WebappsXchanger
- WebmastersXchanger
- ProgrammersXchanger
- DbaXchanger
- DrupalXchanger
- WordpressXchanger
- MagentoXchanger
- JoomlaXchanger
- AndroidXchanger
- AppleXchanger
- GameXchanger
- GamingXchanger
- BlenderXchanger
- UxXchanger
- CookingXchanger
- PhotoXchanger
- StatsXchanger
- MathXchanger
- DiyXchanger
- GisXchanger
- TexXchanger
- MetaXchanger
- ElectronicsXchanger
- StackoverflowXchanger
- BitcoinXchanger
- EthereumXcanger