How optimize a loss function with input the result of an ODE solver: $\textbf{L}$(ODESolve$(\color{#996699}{z}(t_0),f,t_0,t_1,\color{#ecad60}{\theta}))$?
To optimize
$\textbf{L}$, we require gradients with respect to
$\theta$:
- Determine how the gradient of the loss (the adjoint) depends on the hidden state $z$(t) at each instant:
$$\color{#669900}{\textbf{a}}(t)=\frac{\partial \color{#6699CC}{L}}{\partial \color{#996699}{\textbf{z}}(t)}$$
- Compute the adjoint dynamics by solving a another ODE:
$$ \frac{d\color{#669900}{\textbf{a}}(t)}{dt}=\color{#669900}{\textbf{a}}(t)^{T}\frac{\partial f(\color{#996699}{\textbf{z}}(t),t,\color{#ecad60}{\theta})}{\partial \color{#996699}{\textbf{z}}}
$$
- Compute the gradients with respect to the parameters $\theta$ evaluating a third integral:
$$ \frac{d\color{#6699CC}{L}}{d\color{#ecad60}{\theta}}=\int_{t_1}^{t_0}\color{#669900}{\textbf{a}}(t)^T \frac{\partial f (\color{#996699}{\textbf{z}}(t),t,\theta)}{\partial \color{#ecad60}{\theta}}dt $$
Hybrid Physical-Neural ODE
Without neural correction
With neural correction
Results
Netural network trained using single CAMELS simulation of $25^3$ ($h^{-1}$ Mpc)$^3$ volume and $64^3$ dark matter particles at the fiducial cosmology of $\Omega_m = 0.3$
Results: Robustness to changes in resolution and cosmological parameters
Netural network trained using single CAMELS simulation of $25^3$ ($h^{-1}$ Mpc)$^3$ volume and $64^3$ dark matter particles at the fiducial cosmology of $\Omega_m = 0.3$
Higher resolution
Lower resolution
Different Cosmology
Example use-case: Forecasting the power of Higher Order Weak Lensing Statistics with automatically differentiable simulations
Work in collaboration with:
F. Lanusse, C. Modi, B. Horowitz, J. Harnois-Déraps, J.L. Starck,
The LSST Dark Energy Science Collaboration (LSST DESC)
$\Longrightarrow$ Compare the constraining power of weak lensing statistics and
investigate the degeneracy in high dimensional cosmological parameter space.
Introducing FlowPM: Particle-Mesh Simulations in TensorFlow
Modi, Lanusse, Seljak (2020)

import tensorflow as tf
import flowpm
# Defines integration steps
stages = np.linspace(0.1, 1.0, 10, endpoint=True)
initial_conds = flowpm.linear_field(32, # size of the cube
100, # Physical size
ipklin, # Initial powerspectrum
batch_size=16)
# Sample particles and displace them by LPT
state = flowpm.lpt_init(initial_conds, a0=0.1)
# Evolve particles down to z=0
final_state = flowpm.nbody(state, stages, 32)
# Retrieve final density field
final_field = flowpm.cic_paint(tf.zeros_like(initial_conditions),
final_state[0])
Differentiable Lensing Lightcone: TensorFlow-based weak gravitational lensing package
We extend the FlowPM approach:
- Computing the time integration starting from a system of ODE.
- Tune the accuracy-speed by defining a tolerance of the ODE Solver
- Reduced memory usage: no need to store intermediate steps for backpropagation.
- Integrating the Hybrid Physical-Neural parameterisation to compensate for the small-scale approximations
- Implementing ray-tracing and simulating lensing lightcones in the Tensorflow framework
Validating simulations: Lensing $C_{\ell}$
- Predictions from DLL simulations ($128^3$ particles, box size of 205 Mpc/h ) against $\kappa$TNG ($2500^3$ particles, box size of 205 Mpc/h )
(Liu, et al. 2017)
- Predictions from MassiveNus simulations ($1024^3$ particles, box size of 512 Mpc/h) against Halofit.
Compare the information content
We tested the simulations to reproduce a LSST like setting
(Results presented on behalf of LSST DESC)
\[\begin{equation}
F_{\alpha, \beta} =\sum_{i,j} \frac{d\mu_i}{d\theta_{\alpha}}
C_{i,j}^{-1} \frac{d\mu_j}{d\theta_{\beta}}
\end{equation} \]
- Use Fisher matrix to estimate the information content extracted with a given statistic
- Derivative of summary statistics respect to the cosmological parameters.
- Fisher matrices are notoriously unstable
$\Longrightarrow$ they rely on evaluating gradients by finite differences.
- They do not scale well to large number of parameters.
Compare the information content
-
Use Fisher matrix to investigate the degeneracy between the cosmological parameters in high dimensional space and when systematics are included in the analysis.
Conclusion
Takeaway message
- Powerful ways to simulate fast cosmological simulations
- Data driven way of complementing a physical models
- correction scheme to compensate for the small-scales approximations preserving translation and rotation symmetries
- Automatically differentiable physical models for fast inference
- Likelihood-free inference approach
- Inference of the full posterior distribution by using the Hamiltonian Monte Carlo (HMC) method
GitHub repo:
DifferentiableUniverseInitiative/jaxpm-paper
DifferentiableUniverseInitiative/flowpm
LSSTDESC/DifferentiableHOS
Thank you !
the $\Lambda$CDM view of the Universe
Weak Gravitational lensing
The limits of traditional cosmological inference
HSC cosmic shear power spectrum
HSC Y1 constraints on $(S_8, \Omega_m)$
(Hikage et al. 2018)
- Measure the ellipticity $\epsilon = \epsilon_i + \gamma$ of all galaxies
$\Longrightarrow$ Noisy tracer of the weak lensing shear $\gamma$
- Compute summary statistics based on 2pt functions,
e.g. the power spectrum
- Run an MCMC to recover a posterior on model parameters, using an analytic likelihood
$$ p(\theta | x ) \propto \underbrace{p(x | \theta)}_{\mathrm{likelihood}} \ \underbrace{p(\theta)}_{\mathrm{prior}}$$
Main limitation:
The two-point statistics do not fully capture the non-Gaussian information, e.g. information encoded in the peaks of the matter distribution
$\Longrightarrow$ We are dismissing a significant fraction of the information!
How to perform inference over forward simulation models?
- Implicit Inference: Treat the simulator as a black-box with only the ability to sample from the joint distribution
$$(x, \theta) \sim p(x, \theta)$$
a.k.a.
- Simulation-Based Inference (SBI)
- Likelihood-free inference (LFI)
- Approximate Bayesian Computation (ABC)
- Explicit Inference: Treat the simulator as a probabilistic model and perform inference over the joint posterior
$$p(\theta, z | x) \propto p(x | z, \theta) p(z | \theta) p(\theta) $$
a.k.a.
- Bayesian Hierarchical Modeling (BHM)
$\Longrightarrow$ For a given simulation model, both methods should converge to the same posterior!
Mesh FlowPM: distributed, GPU-accelerated, and automatically differentiable simulations
- We developed a Mesh TensorFlow implementation that can scale on GPU clusters (horovod+NCCL).
- For a $2048^3$ simulation:
- Distributed on 256 NVIDIA V100 GPUs
- Runtime: 3 mins
- Don't hesitate to reach out if you have a use case for model parallelism!
![]()
Projections of final density field
Camels simulations
PM simulations
PM+NN correction
PM+PGD correction
Results: Robustness to changes in resolution and cosmological parameters
Netural network trained using single CAMELS simulation of $25^3$ ($h^{-1}$ Mpc)$^3$ volume and $64^3$ dark matter particles at the fiducial cosmology of $\Omega_m = 0.3$
Higher resolution
Different Cosmology!
Proof of Concept
Convergence map at source redshift z = 0.91 from DLL, PM only. Right panel: Same convergence map when the HPN correction is applied.
Validating simulations: HPN validation
Predictions of DLL simulations before and after using the Hybrid Physical-Neural against $\kappa$TNG
$C_{\ell}$
Fractional $C_{\ell}$
Validating simulations: Peak counts
Local maxima in the map
Non-Gaussian information
Trace overdense regions
Validating simulations: Peak counts
Istrotropic undecimated wavelet transform
$c_0=c_J+\sum_{j=1}^{J_{max}} w_j$
Multi-scale approach
Peak counts are local point-like features and have a sparse representation in the wavelet domain
Validating simulations: Peak counts
-
Fractional number of peaks of DLL simulations and $\kappa$TNG simulations for different sources redshift.