Automatically Differentiable Weak Lensing Simulations for Full-Field Cosmological Inference

Denise Lanzieri

slides at dlanzieri.github.io/Munich2023/

Cosmological N-Body Simulations

How do we simulate the Universe in a fast and differentiable way?

The Particle-Mesh scheme for N-body simulations

The idea: approximate gravitational forces by estimating densities on a grid.

The numerical scheme:
- From the particle positions estimate the density of particles on a mesh
- Apply a Fourier transform to obtain the over-density field $\delta_k$ in Fourier space.
- Interpolate back the force at every particle position

$\to$ Only a series of FFTs and interpolations.

Fill the gap in the accuracy-speed space

Fast (we don't solve the full N-body problem)

Not able to resolve structures with scales smaller than the mesh resolution
- $\to$ Overdensity structures less sharp than full N-body counterparts
- $\to$ Lack power on small scales

The correction idea : mimics the physics that is missing

Camels simulations

PM simulations

Hybrid Physical-Neural ODEs for Fast N-body Simulations

Fast realization of complex processes

$\Longrightarrow$ Take the effective physics description and combine it with a ML approach

Cosmological simulations are based on physical processes

$\Longrightarrow$ These impose symmetries and constraints

Augment the physical equations with a neural network

We compute the time integration from a system of ordinary differential equations (ODE) $$\left\{ \begin{array}{ll} \frac{d \color{#6699CC}{\mathbf{x}} }{d a} & = \frac{1}{a^3 E(a)} \color{#6699CC}{\mathbf{v}} \\ \frac{d \color{#6699CC}{\mathbf{v}}}{d a} & = \frac{1}{a^2 E(a)} F_\theta( \color{#6699CC}{\mathbf{x}} , a), \\ F_\theta( \color{#6699CC}{\mathbf{x}}, a) &= \frac{3 \Omega_m}{2} \nabla \left[ \color{#669900}{\phi_{PM}} (\color{#6699CC}{\mathbf{x}}) \right] \end{array} \right. $$

$\mathbf{x}$ and $\mathbf{v}$ define the position and the velocity of the particles
$\phi_{PM}$ is the gravitational potential in the mesh

$\to$ We can use this parametrisation to complement the physical ODE with neural networks.

$$F_\theta(\mathbf{x}, a) = \frac{3 \Omega_m}{2} \nabla \left[ \phi_{PM} (\mathbf{x}) \ast \mathcal{F}^{-1} (1 + \color{#996699}{f_\theta(a,|\mathbf{k}|)}) \right] $$

Correction integrated as a Fourier-based isotropic filter $f_{\theta}$ $\to$ incorporates translation and rotation symmetries

Learn the Neural Filter

$f_{\theta}(a)$ is defined as B-spline functions whose coefficients are the output of the Neural Network of parameters $\theta$.

Train and validation loss

$$\mathcal{L} = \sum_{i}^{snapshots} \lambda_1|| \color{#6699CC}{\mathbf{x}^{ref}_i} - \color{#6699CC}{\mathbf{x}_i}||_2^2 + \lambda_2 || \frac{\color{#996699}{p_i(k)}}{\color{#996699}{p_i^{ref}(k)}} -1 ||_2^2 $$

We adopt a loss function penalizing both the particle positions and the overall matter power spectrum at different snapshot times

We train and compare the model to the CAMELS simulations (Villaescusa-Navarro et al., 2021)

We use a single N-body simulation of $25^3$ ($h^{-1}$ Mpc)$^3$ volume, $64^3$ dark matter particles at the fiducial cosmology of $\Omega_m = 0.3$ and $\sigma_8 = 0.8$

Whole code implemented in the Python package Jax.

Backpropagation through the ODE solver

We are following the technique from Neural ODEs to backpropagate through an ODE solver (Neural Ordinary Differential Equations, Chen et al. 2018).

How optimize a loss function with input the result of an ODE solver: $\textbf{L}$(ODESolve$(\color{#996699}{z}(t_0),f,t_0,t_1,\color{#ecad60}{\theta}))$?

To optimize $\textbf{L}$, we require gradients with respect to $\theta$:

Determine how the gradient of the loss (the adjoint) depends on the hidden state $z$(t) at each instant: $$\color{#669900}{\textbf{a}}(t)=\frac{\partial \color{#6699CC}{L}}{\partial \color{#996699}{\textbf{z}}(t)}$$
Compute the adjoint dynamics by solving a another ODE: $$ \frac{d\color{#669900}{\textbf{a}}(t)}{dt}=\color{#669900}{\textbf{a}}(t)^{T}\frac{\partial f(\color{#996699}{\textbf{z}}(t),t,\color{#ecad60}{\theta})}{\partial \color{#996699}{\textbf{z}}} $$
Compute the gradients with respect to the parameters $\theta$ evaluating a third integral: $$ \frac{d\color{#6699CC}{L}}{d\color{#ecad60}{\theta}}=\int_{t_1}^{t_0}\color{#669900}{\textbf{a}}(t)^T \frac{\partial f (\color{#996699}{\textbf{z}}(t),t,\theta)}{\partial \color{#ecad60}{\theta}}dt $$

Hybrid Physical-Neural ODE

Without neural correction

With neural correction

Results

Netural network trained using single CAMELS simulation of $25^3$ ($h^{-1}$ Mpc)$^3$ volume and $64^3$ dark matter particles at the fiducial cosmology of $\Omega_m = 0.3$

Results: Robustness to changes in resolution and cosmological parameters

Netural network trained using single CAMELS simulation of $25^3$ ($h^{-1}$ Mpc)$^3$ volume and $64^3$ dark matter particles at the fiducial cosmology of $\Omega_m = 0.3$

Higher resolution

Lower resolution

Different Cosmology

Example use-case: Forecasting the power of Higher Order Weak Lensing Statistics with automatically differentiable simulations

Work in collaboration with:
F. Lanusse, C. Modi, B. Horowitz, J. Harnois-Déraps, J.L. Starck,
The LSST Dark Energy Science Collaboration (LSST DESC)

$\Longrightarrow$ Compare the constraining power of weak lensing statistics and investigate the degeneracy in high dimensional cosmological parameter space.

Introducing FlowPM: Particle-Mesh Simulations in TensorFlow

Modi, Lanusse, Seljak (2020)

https://github.com/DifferentiableUniverseInitiative/flowpm


										import tensorflow as tf
										import flowpm
										# Defines integration steps
										stages = np.linspace(0.1, 1.0, 10, endpoint=True)

										initial_conds = flowpm.linear_field(32,       # size of the cube
																			100,       # Physical size
																			ipklin,    # Initial powerspectrum
																			batch_size=16)

										# Sample particles and displace them by LPT
										state = flowpm.lpt_init(initial_conds, a0=0.1)

										# Evolve particles down to z=0
										final_state = flowpm.nbody(state, stages, 32)

										# Retrieve final density field
										final_field = flowpm.cic_paint(tf.zeros_like(initial_conditions),
																		final_state[0])

Differentiable Lensing Lightcone: TensorFlow-based weak gravitational lensing package

Computing the time integration starting from a system of ODE.
- Tune the accuracy-speed by defining a tolerance of the ODE Solver
- Reduced memory usage: no need to store intermediate steps for backpropagation.

Integrating the Hybrid Physical-Neural parameterisation to compensate for the small-scale approximations
Implementing ray-tracing and simulating lensing lightcones in the Tensorflow framework

Validating simulations: Lensing $C_{\ell}$

Predictions from DLL simulations ($128^3$ particles, box size of 205 Mpc/h ) against $\kappa$TNG ($2500^3$ particles, box size of 205 Mpc/h )

(Liu, et al. 2017)

Predictions from MassiveNus simulations ($1024^3$ particles, box size of 512 Mpc/h) against Halofit.

Compare the information content

We tested the simulations to reproduce a LSST like setting
(Results presented on behalf of LSST DESC)

\[\begin{equation} F_{\alpha, \beta} =\sum_{i,j} \frac{d\mu_i}{d\theta_{\alpha}} C_{i,j}^{-1} \frac{d\mu_j}{d\theta_{\beta}} \end{equation} \]

Use Fisher matrix to estimate the information content extracted with a given statistic

Derivative of summary statistics respect to the cosmological parameters.

Fisher matrices are notoriously unstable
$\Longrightarrow$ they rely on evaluating gradients by finite differences.

They do not scale well to large number of parameters.

Compare the information content

Use Fisher matrix to investigate the degeneracy between the cosmological parameters in high dimensional space and when systematics are included in the analysis.

Conclusion

Takeaway message

Powerful ways to simulate fast cosmological simulations
Data driven way of complementing a physical models
- correction scheme to compensate for the small-scales approximations preserving translation and rotation symmetries

Automatically differentiable physical models for fast inference
- Likelihood-free inference approach
- Inference of the full posterior distribution by using the Hamiltonian Monte Carlo (HMC) method

GitHub repo:
DifferentiableUniverseInitiative/jaxpm-paper
DifferentiableUniverseInitiative/flowpm
LSSTDESC/DifferentiableHOS
Thank you !

APPENDIX

the $\Lambda$CDM view of the Universe

Weak Gravitational lensing

The limits of traditional cosmological inference

HSC cosmic shear power spectrum

HSC Y1 constraints on $(S_8, \Omega_m)$

(Hikage et al. 2018)

Measure the ellipticity $\epsilon = \epsilon_i + \gamma$ of all galaxies
$\Longrightarrow$ Noisy tracer of the weak lensing shear $\gamma$

Compute summary statistics based on 2pt functions,
e.g. the power spectrum

Run an MCMC to recover a posterior on model parameters, using an analytic likelihood $$ p(\theta | x ) \propto \underbrace{p(x | \theta)}_{\mathrm{likelihood}} \ \underbrace{p(\theta)}_{\mathrm{prior}}$$

Main limitation:

The two-point statistics do not fully capture the non-Gaussian information, e.g. information encoded in the peaks of the matter distribution

$\Longrightarrow$ We are dismissing a significant fraction of the information!

How to perform inference over forward simulation models?

Implicit Inference: Treat the simulator as a black-box with only the ability to sample from the joint distribution $$(x, \theta) \sim p(x, \theta)$$ a.k.a.
- Simulation-Based Inference (SBI)
- Likelihood-free inference (LFI)
- Approximate Bayesian Computation (ABC)

Explicit Inference: Treat the simulator as a probabilistic model and perform inference over the joint posterior $$p(\theta, z | x) \propto p(x | z, \theta) p(z | \theta) p(\theta) $$ a.k.a.
- Bayesian Hierarchical Modeling (BHM)

$\Longrightarrow$ For a given simulation model, both methods should converge to the same posterior!

Mesh FlowPM: distributed, GPU-accelerated, and automatically differentiable simulations

We developed a Mesh TensorFlow implementation that can scale on GPU clusters (horovod+NCCL).

For a $2048^3$ simulation:
- Distributed on 256 NVIDIA V100 GPUs
- Runtime: 3 mins

Don't hesitate to reach out if you have a use case for model parallelism!

https://github.com/DifferentiableUniverseInitiative/mesh

Projections of final density field

Camels simulations

PM simulations

PM+NN correction

PM+PGD correction

Results: Robustness to changes in resolution and cosmological parameters

Netural network trained using single CAMELS simulation of $25^3$ ($h^{-1}$ Mpc)$^3$ volume and $64^3$ dark matter particles at the fiducial cosmology of $\Omega_m = 0.3$

Higher resolution

Different Cosmology!

Proof of Concept

Convergence map at source redshift z = 0.91 from DLL, PM only. Right panel: Same convergence map when the HPN correction is applied.

Validating simulations: HPN validation

Predictions of DLL simulations before and after using the Hybrid Physical-Neural against $\kappa$TNG

$C_{\ell}$

Fractional $C_{\ell}$

Validating simulations: Peak counts

Local maxima in the map

Non-Gaussian information

Trace overdense regions

Validating simulations: Peak counts

Istrotropic undecimated wavelet transform

$c_0=c_J+\sum_{j=1}^{J_{max}} w_j$

Multi-scale approach

Peak counts are local point-like features and have a sparse representation in the wavelet domain

Validating simulations: Peak counts

Fractional number of peaks of DLL simulations and $\kappa$TNG simulations for different sources redshift.