By Kyle Cranmer, Gilles Louppe

Recent work in density estimation uses a bijection $f : X \to Z$ (e.g. an invertible flow or autoregressive model) and a tractable density $p(z)$ (e.g.    ).
\begin{equation}
p(x) = p(f_\phi(x)) \left| \det\left ( \frac{\partial f_\phi(x)}{\partial x_T} \right) \right | ;,
\end{equation}
where $\phi$ are the internal network parameters for the bijection $f_\phi$. Learning proceeds via gradient ascent $\nabla_\phi \sum_i \log p(x_i)$ with data $x_i$ (i.e. maximum likelihood wrt. the internal parameters $\phi$). Since $f$ is invertible, then this model can also be used as a generative model for $X$.

This can be generalized to the conditional density $p(x|\theta)$ by utilizing a family of bijections $f_{\theta} : X \to Z$ parametrized by $\theta$ (e.g.  ).
\begin{equation}
p(x|\theta) = p(f_{\phi; \theta}(x)) \left| \det \left ( \frac{\partial f_{\phi; \theta}(x)}{\partial x_T} \right) \right |
\end{equation}
Here $\theta$ and $x$ are input to the network (and its inverse) and $\phi$ are internal network parameters. Again, learning proceeds via gradient ascent $\nabla_\phi \sum_i \log p(x_i|\theta_i)$ with data $x_i,\theta_i$.

We observe that not only can this model be used as a conditional generative model $p(x|\theta)$, but it can also be used to perform asymptotically exact, amortized likelihood-free inference on $\theta$.

This is particularly interesting when $\theta$ is identified with the parameters of an intractable, non-differentiable computer simulation or the conditions of some real world data collection process.

I wish we would have written $p_X(x|\theta)$ and $p_Z(f(x))$ for clarity. Can't change it now.