Forecasting the power of Higher Order Weak Lensing Statistics with automatically differentiable simulations

Denise Lanzieri

slides at github.com/dlanzieri/talks/LSST_France_Nov2021

the $\Lambda$CDM view of the Universe

Weak Gravitational lensing

Traditional cosmological inference

How do we make the most of the available data?

HSC cosmic shear power spectrum

HSC Y1 constraints on $(S_8, \Omega_m)$

(Hikage,..., Lanusse, et al. 2018)

Data reduction

Compute summary statistics based on 2pt functions,
e.g. the power spectrum

Run an MCMC to recover a posterior on model parameters, using an analytic likelihood $$ p(\theta | x ) \propto \underbrace{p(x | \theta)}_{\mathrm{likelihood}} \ \underbrace{p(\theta)}_{\mathrm{prior}}$$

Main limitation: The lensing convergence field is inherently and significantly non-Gaussian

The two-point statistics do not fully capture the non-Gaussian information encoded in the peaks of the matter distribution

2 Point Statistics: A Suboptimal Measure

How to maximize the information gain?

(Ajani, et al. 2020)

Approaches based on measuring high-order correlations to access the non-Gaussian information
e.g. the Peaks count

Simulation-based approaches

Main limitation: Gradient-based

Costly as they require a large number of simulations

Intractable for more than 3 or 4 cosmological parameters. 

Numerical Differentiation

\[\begin{equation} \left.\frac{df(x)}{dx}\right|_{x_1} \approx \frac{f(x_1+h)-f(x_1)}{h} \end{equation} \]

Flaws

It’s numerically very unstable

It’s very expensive in term of simulation time

Computes an approximation

Different approach :

Automatic Differentiation

Automatic Differentiation and Gradients in TensorFlow

Automatic differentiation allows you to compute analytic derivatives of arbitraty expressions:
If I form the expression $y = a * x + b$, it is separated in fundamental ops: $$ y = u + b \qquad u = a * x $$ then gradients can be obtained by the chain rule: $$\frac{\partial y}{\partial x} = \frac{\partial y}{\partial u} \frac{ \partial u}{\partial x} = 1 \times a = a$$

To differentiate automatically, TensorFlow remember what operations happen in what order during the forward pass and traverses this list of operations in reverse order to compute gradients.

Advantages

Derivative as functions

No numerical approximation

High speed

Cosmological N-Body Simulations

How do we simulate the Universe in a fast and differentiable way?

The Particle-Mesh scheme for N-body simulations

The idea: approximate gravitational forces by estimating densities on a grid.

The numerical scheme:
- Estimate the density of particles on a mesh
  $\Longrightarrow$ compute gravitational forces by FFT
- Interpolate forces at particle positions
- Update particle velocity and positions, and iterate

Fast and simple, at the cost of approximating short range interactions.

$\Longrightarrow$ Only a series of FFTs and interpolations.

Introducing FlowPM: Particle-Mesh Simulations in TensorFlow

https://github.com/DifferentiableUniverseInitiative/flowpm https://arxiv.org/pdf/2010.11847.pdf


											import tensorflow as tf
											import flowpm
											# Defines integration steps
											stages = np.linspace(0.1, 1.0, 10, endpoint=True)

											initial_conds = flowpm.linear_field(32,        # size of the cube
											100,       # Physical size
											ipklin,    # Initial powerspectrum
											batch_size=16)

											# Sample particles and displace them by LPT
											state = flowpm.lpt_init(initial_conds, a0=0.1)

											# Evolve particles down to z=0
											final_state = flowpm.nbody(state, stages, 32)

											# Retrieve final density field
											final_field = flowpm.cic_paint(tf.zeros_like(initial_conditions),
											final_state[0])

											with tf.Session() as sess:
											sim = sess.run(final_field)

Seamless interfacing with deep learning components

Potential Gradient Descent (PGD)

Fast (limited number of time steps while enforcing the correct linear growth)

Cannot give accurate halo matter profiles or matter power spectrum

Halo virialization

(Biwei Dai et al. 2018)

Mocking the weak lensing universe: The Born Approximation

Numerical simulation of WL features rely on ray-tracing through the output of N-body simulations

Knowledge of the Gravitational potential and accurate solvers for light ray trajectories is computationally expensive

Born approximation , only requiring knowledge of the density field, can be implemented more efficiently and at a lower computational cost

\[\begin{equation} \kappa_{born}(\boldsymbol{\theta},\chi_s)= \frac{3H_0^2 \Omega_m}{2c^2} \int_0^{\chi_s} d\chi \frac{\chi}{a(\chi)} W(\chi,\chi_s) \delta(\chi \boldsymbol{\theta},\chi). \end{equation} \]

Proof of Concept

Convergence map at z=1.0, based on a 3D simulation of $128^3$ particles for side. The 2D lensing map has an angular extent of $5^{\circ}.$
Angular Power Spectrum $C_\ell$

Proof of Concept: Peak counts

Local maxima in the map

Non-Gaussian information

Trace overdense regions

Proof of Concept: $l_1$norm

We take the sum of all wavelet coefficients of the original image $\kappa$ map in a given bin $i$ defined by two values, $B_i$ and $B_{i+1}$,

Non-Gaussian information

Information encoded in all pixels

Avoids the problem of defining peaks and voids

A first application: Fisher forecasts

We are testing the simulations to reproduce a LSST Y1-like setting

(Preliminary results, presented on behalf of LSST DESC)

\[\begin{equation} F_{\alpha, \beta} =\sum_{i,j} \frac{d\mu_i}{d\theta_{\alpha}} C_{i,j}^{-1} \frac{d\mu_j}{d\theta_{\beta}} \end{equation} \]

Use Fisher matrix to estimate the information content extracted with a given statistic

Derivative of summary statistics respect to the cosmological parameters.
e.g. the $\Omega_c$, $\sigma_8$

Fisher matrices are notoriously unstable
$\Longrightarrow$ they rely on evaluating gradients by finite differences.

They do not scale well to large number of parameters.

A first application: Fisher forecasts

Use Fisher matrix to investigate the constraining power of different statistics

A first application: Fisher forecasts

Use Fisher matrix to estimate the information content extracted with a given statistic for more cosmological parameters simultaneously

Fisher constraints from the full peak count statistics

Conclusion

What can be gained by simulating the Universe in a fast and differentiable way?

Investigate the constraining power of various map-based higher order weak lensing statistics and control the systematics

Differentiable physical models for fast inference ( made efficient by very fast lensing lightcone and having access to gradient)

Even analytic cosmological computations can benefit from differentiability.

Thank you !