Forecasting the power of higher order weak lensing statistics with automatically differentiable simulations

Denise Lanzieri

the $\Lambda$CDM view of the Universe

Weak Gravitational lensing

Weak Gravitational Lensing

The lens equation relates the unlensed angular sky position $\beta$ of a source to the lensed position $\theta$ via the deflection angle $\alpha$ \begin{equation} \vec{\beta}=\vec{\theta}-\vec{\alpha}(\vec{\theta}) \end{equation}

The deflection angle is the gradient of a scalar potential, which is an integral over the light travel path \begin{equation} \alpha_{i}=\partial_i \frac{2}{c^2} \int_0^{\chi}d\chi'\frac{\chi-\chi'}{\chi\chi'}\Phi(\chi'\vec{\theta},\chi') \end{equation}

In the weak lensing approximation, The image distortion is described by the Jacobian matrix: \begin{equation} A_{ij}=\frac{\partial \beta_i}{\partial \theta_j} =\delta_{ij}-\frac{\partial \alpha_i}{\partial \theta_j} \end{equation}

Weak Gravitational Lensing

The Jacobian matrix is parametrized in terms of the scalar convergence, $\kappa$, and the two-component spin-two shear $\gamma$$=(\gamma_1,\gamma_2)$, \begin{equation} A= \begin{pmatrix} 1-\kappa-\gamma_1 & -\gamma_2 \\ -\gamma_2 &1-\kappa+\gamma_1 \end{pmatrix} \end{equation}

$\kappa$ is an isotropic increase or decrease of the observed size of a source image

$\gamma$ is a anisotropic stretching, turning a circular into an elliptical light distribution.

Weak lensing power spectrum

The lensing convergence is related to the projection of the 3D cosmological mass density perturbations along the line-of-sight.

\begin{equation} \kappa(\boldsymbol{\theta})= \frac{3H_0^2 \Omega_m}{2c^2} \int_0^{\inf} d\chi \frac{\chi}{a(\chi)} W(\chi,\chi_s) \delta(\chi \boldsymbol{\theta},\chi). \end{equation}
The cosmological information contained in cosmic shear data on large linear scales is well captured by two-point statistics in the form of the lensing power spectrum $C_{\ell}$ (or the shear two-point correlation functions):

\begin{equation} \left \langle \tilde{\kappa}(\ell)\tilde{\kappa}^{*}(\ell') \right \rangle =(2 \pi)^2\delta_{D}(\ell-\ell')P_{k}(\ell) \end{equation}

the Rubin Observatory Legacy Survey of Space and Time

1000 images each night, 15 TB/night for 10 years

18,000 square degrees, observed once every few days

Tens of billions of objects, each one observed $\sim1000$ times

$\Longrightarrow$ Incredible potential for discovery, along with unprecedented challenges.

The challenges of modern surveys

$\Longrightarrow$ Modern surveys will provide large volumes of high quality data

A Blessing

Unprecedented statistical power

Great potential for new discoveries

A Curse

Existing methods are reaching their limits at every step of the science analysis

Control of systematic uncertainties becomes paramount

LSST forecast on dark energy parameters

$\Longrightarrow$ Dire need for novel analysis techniques to fully realize the potential of modern surveys.

Traditional cosmological inference

How do we make the most of the available data?

HSC cosmic shear power spectrum

HSC Y1 constraints on $(S_8, \Omega_m)$

(Hikage,..., Lanusse, et al. 2018)

Data reduction

Compute summary statistics based on 2pt functions,
e.g. the power spectrum

Run an MCMC to recover a posterior on model parameters, using an analytic likelihood $$ p(\theta | x ) \propto \underbrace{p(x | \theta)}_{\mathrm{likelihood}} \ \underbrace{p(\theta)}_{\mathrm{prior}}$$

Main limitation: The lensing convergence field is inherently and significantly non-Gaussian

The two-point statistics do not fully capture the non-Gaussian information encoded in the peaks of the matter distribution

$\Longrightarrow$ We are dismissing most of the information!

2 Point Statistics: A Suboptimal Measure

How to maximize the information gain?

(Ajani, et al. 2020)

Approaches based on measuring high-order correlations to access the non-Gaussian information
e.g. the Peaks count

Simulation-based approaches

Main limitation: Gradient-based

Costly as they require a large number of simulations

Intractable for more than 3 or 4 cosmological parameters. 

Numerical Differentiation

\[\begin{equation} \left.\frac{df(x)}{dx}\right|_{x_1} \approx \frac{f(x_1+h)-f(x_1)}{h} \end{equation} \]

Flaws

It’s numerically very unstable

It’s very expensive in term of simulation time

Computes an approximation

Different approach :

Automatic Differentiation

Automatic Differentiation and Gradients in TensorFlow

Automatic differentiation allows you to compute analytic derivatives of arbitraty expressions:
If I form the expression $y = a * x + b$, it is separated in fundamental ops: $$ y = u + b \qquad u = a * x $$ then gradients can be obtained by the chain rule: $$\frac{\partial y}{\partial x} = \frac{\partial y}{\partial u} \frac{ \partial u}{\partial x} = 1 \times a = a$$

To differentiate automatically, TensorFlow remember what operations happen in what order during the forward pass and traverses this list of operations in reverse order to compute gradients.

Advantages

Derivative as functions

No numerical approximation

High speed

Cosmological N-Body Simulations

How do we simulate the Universe in a fast and differentiable way?

The Particle-Mesh scheme for N-body simulations

The idea: approximate gravitational forces by estimating densities on a grid.

The numerical scheme:
- Estimate the density of particles on a mesh
  $\Longrightarrow$ compute gravitational forces by FFT
- Interpolate forces at particle positions
- Update particle velocity and positions, and iterate

Fast and simple, at the cost of approximating short range interactions.

$\Longrightarrow$ Only a series of FFTs and interpolations.

Introducing FlowPM: Particle-Mesh Simulations in TensorFlow

https://github.com/DifferentiableUniverseInitiative/flowpm


											import tensorflow as tf
											import flowpm
											# Defines integration steps
											stages = np.linspace(0.1, 1.0, 10, endpoint=True)

											initial_conds = flowpm.linear_field(32,        # size of the cube
											100,       # Physical size
											ipklin,    # Initial powerspectrum
											batch_size=16)

											# Sample particles and displace them by LPT
											state = flowpm.lpt_init(initial_conds, a0=0.1)

											# Evolve particles down to z=0
											final_state = flowpm.nbody(state, stages, 32)

											# Retrieve final density field
											final_field = flowpm.cic_paint(tf.zeros_like(initial_conditions),
											final_state[0])

											with tf.Session() as sess:
											sim = sess.run(final_field)

Seamless interfacing with deep learning components

Potential Gradient Descent (PGD)

Fast (limited number of time steps while enforcing the correct linear growth)

Cannot give accurate halo matter profiles or matter power spectrum

Halo virialization

(Biwei Dai et al. 2018)

Potential Gradient Descent (PGD)

Additional displacements to sharpen the halos

The direction of the displacements points towards the halo center (local potential minimum).

The gravitational force: \begin{equation} \mathbf{F}=-\nabla\Phi \end{equation}
The PGD correction displacement: \begin{equation} \mathbf{S}=-\alpha\nabla \hat{O}_{h}\hat{O}_{l}\Phi \end{equation} High pass filter prevents the large scale growth, low pass filter reduces the numerical effect

(Biwei Dai et al. 2018)

Potential Gradient Descent (PGD)

Result on a density map (z=0.03) with PGD turned off (left) and turned on (right):

Structures are not modified but simply sharpened.

Mocking the weak lensing universe: The Born Approximation

Numerical simulation of WL features rely on ray-tracing through the output of N-body simulations

Knowledge of the Gravitational potential and accurate solvers for light ray trajectories is computationally expensive

Born approximation , only requiring knowledge of the density field, can be implemented more efficiently and at a lower computational cost

\[\begin{equation} \kappa_{born}(\boldsymbol{\theta},\chi_s)= \frac{3H_0^2 \Omega_m}{2c^2} \int_0^{\chi_s} d\chi \frac{\chi}{a(\chi)} W(\chi,\chi_s) \delta(\chi \boldsymbol{\theta},\chi). \end{equation} \]

Proof of Concept

We are testing the simulations to reproduce a DESC Y1-like setting

Convergence map at z=1.0, based on a 3D simulation of $128^3$ particles for side. The 2D lensing map has an angular extent of $5^{\circ}.$
Angular Power Spectrum $C_\ell$

Proof of Concept: Peak counts

Local maxima in the map

Non-Gaussian information

Trace overdense regions

Proof of Concept: $l_1$norm

We take the sum of all wavelet coefficients of the original image $\kappa$ map in a given bin $i$ defined by two values, $B_i$ and $B_{i+1}$,

Non-Gaussian information

Information encoded in all pixels

Avoids the problem of defining peaks and voids

A first application: Fisher forecasts

\[\begin{equation} F_{\alpha, \beta} =\sum_{i,j} \frac{d\mu_i}{d\theta_{\alpha}} C_{i,j}^{-1} \frac{d\mu_j}{d\theta_{\beta}} \end{equation} \]

Derivative of summary statistics respect to the cosmological parameters.
e.g. the $\Omega_c$, $\sigma_8$

The main reason for using Fisher matrices is that full MCMCs are extremely slow, and get slower with dimensionality.

Fisher matrices are notoriously unstable
$\Longrightarrow$ they rely on evaluating gradients by finite differences.

They do not scale well to large number of parameters.

A first application: Fisher forecasts

Use Fisher matrix to investigate the constraining power of different statistics

A first application: Fisher forecasts

Use Fisher matrix to estimate the information content extracted with a given statistic for more cosmological parameters simultaneously

Intrinsic alignments of galaxies

$$ \underbrace{\epsilon}_{\tiny \mbox{observed ellipticity}} = \underbrace{\epsilon_i}_{\tiny \mbox{intrinsic ellipticity}} + \underbrace{\gamma}_{\tiny \mbox{lensing}}$$

assuming $< \epsilon_i \epsilon_i^\prime > = 0$

Harnois-Déraps et al. (2021)

Infusion of intrinsic alignments

Intrinsic alignments: Validation and next steps

Validate our infusion model by comparing two-point statistics against predictions from the IA model and prove to recover good match

Measure the impact of the IA on three different statistics: angular power spectrum, peaks, $l_1$norm

Harnois-Déraps et al. (2021)

Summary

We implement fast and accurate FlowPM N-body simulations based on the TensorFlow framework to model the Large-Scale Structure in a fast and differentiable way.

We simulate lensing lightcones implementing gravitational lensing ray-tracing in this framework

We take derivatives of the summary statistic resulting on the gravitational lensing maps with respect to cosmological parameters (or any systematics included in the simulations)

We use these derivatives to measure the Fisher information content of lensing summary statistics on cosmological parameters

Conclusion

What can be gained by simulating the Universe in a fast and differentiable way?

Differentiable physical models for fast inference ( made efficient by very fast lensing lightcone and having access to gradient)
Even analytic cosmological computations can benefit from differentiability.

Investigate the constraining power of various map-based higher order weak lensing statistics and control the systematics

Thank you !