By Kyle Cranmer, Gilles Louppe

Recent work in density estimation uses a bijection $f : X \to Z$ (e.g. an invertible flow or autoregressive model) and a tractable density $p(z)$ (e.g. [1] [2] [3] [4]).
p(x) = p(f_\phi(x)) \left| \det\left ( \frac{\partial f_\phi(x)}{\partial x_T} \right) \right | ;,
where $\phi$ are the internal network parameters for the bijection $f_\phi$. Learning proceeds via gradient ascent $\nabla_\phi \sum_i \log p(x_i)$ with data $x_i$ (i.e. maximum likelihood wrt. the internal parameters $\phi$). Since $f$ is invertible, then this model can also be used as a generative model for $X$.

This can be generalized to the conditional density $p(x|\theta)$ by utilizing a family of bijections $f_{\theta} : X \to Z$ parametrized by $\theta$ (e.g. [5] [6]).
p(x|\theta) = p(f_{\phi; \theta}(x)) \left| \det \left ( \frac{\partial f_{\phi; \theta}(x)}{\partial x_T} \right) \right |
Here $\theta$ and $x$ are input to the network (and its inverse) and $\phi$ are internal network parameters. Again, learning proceeds via gradient ascent $\nabla_\phi \sum_i \log p(x_i|\theta_i)$ with data $x_i,\theta_i$.

We observe that not only can this model be used as a conditional generative model $p(x|\theta)$, but it can also be used to perform asymptotically exact, amortized likelihood-free inference on $\theta$.

This is particularly interesting when $\theta$ is identified with the parameters of an intractable, non-differentiable computer simulation or the conditions of some real world data collection process.


Many thanks to Durk Kingma, Max Welling, Ian Goodfellow, and Shakir Mohamed for enlightening discussions at NIPS2016.

Kyle Cranmer · 9 Dec, 2016

I wish we would have written $p_X(x|\theta)$ and $p_Z(f(x))$ for clarity. Can't change it now.

Kyle Cranmer · 9 Dec, 2016
Please log in to add a comment.

Kyle Cranmer, Gilles Louppe



Published: 8 Dec, 2016

Cc by