**j-ISBA**

## Abstracts for the JB^3 webinars with the winners of the Blackwell-Rosenbluth Award

### Monday, November 28, 2022 at 1pm UTC

#### Leah South

Improving Estimates of Posterior Expectations with Stein-Based Control Variates and the ZVCV Package

This talk is about improving estimates of posterior expectations using the derivatives of the log posterior or unbiased estimates of this quantity. I will first give an overview of existing methods in the field, as per [1], before introducing the semi-exact control functionals (SECF) method [2]. SECF is based on control functionals and Sard’s approach to numerical integration. The use of Sard’s approach ensures that our control functionals are exact on all polynomials up to a fixed degree in the Bernstein-von-Mises limit. SECF is also bias-correcting, in the sense that it is capable of removing asymptotic bias in biased MCMC samplers under some conditions. I will use several Bayesian inference examples to illustrate the potential for reduction in mean square error. I will also demonstrate how to implement SECF and other variance reduction methods in my ZVCV package on CRAN.

[1] South, L. F, Riabiz, M., Teymur, O. and Oates, C. J. (2022). Postprocessing of MCMC. Annual Review of Statistics and Its Application, 9, 529-555.

[2] South, L. F., Karvonen, T., Nemeth, C., Girolami, M., & Oates, C. (2022). Semi-Exact Control Functionals From Sard's Method. Biometrika, 109(2), 351-367.

#### Jeremy Heng

Statistical inference for individual-based models of disease transmission

Individual-based models of disease transmission involve stochastic rules that specify how a number of individuals would infect one another, recover or be removed from the population. To facilitate statistical inference, common yet stringent assumptions stipulate interchangeability of individuals and that all pairwise contacts are equally likely. In this talk, I will discuss two computationally tractable inference strategies when such modeling assumptions are relaxed.

#### Swapnil Mishra

- 𝜋VAE: deep generative learning for MCMC inference on stochastic processes

Stochastic processes provide a mathematically elegant way to model complex data. In theory, they provide flexible priors over function classes that can encode a wide range of interesting assumptions. However, in practice efficient inference by optimisation or marginalisation is difficult, a problem further exacerbated with big data and high dimensional input spaces. We propose a novel variational autoencoder (VAE) called the prior encoding variational autoencoder (𝜋VAE). 𝜋VAE is a new continuous stochastic process. We use 𝜋VAE to learn low dimensional embeddings of function classes by combining a trainable feature mapping with generative model using a VAE. We show that our framework can accurately learn expressive function classes such as Gaussian processes, but also properties of functions such as their integrals. For popular tasks, such as spatial interpolation, 𝜋VAE achieves state-of-the-art performance both in terms of accuracy and computational efficiency. Perhaps most usefully, we demonstrate an elegant and scalable means of performing fully Bayesian inference for stochastic processes within probabilistic programming languages such as Stan.

### Tuesday, November 29, 2022 at 1pm UTC

#### Sharmistha Guha

A Bayesian Approach for Network Classificatione

We present a novel Bayesian binary classification framework for networks with labeled nodes. Our approach is motivated by applications in brain connectome studies, where the overarching goal is to identify both regions of interest (ROIs) in the brain and connections between ROIs that influence how study subjects are classified. We propose a novel binary logistic regression framework with the network as the predictor, and model the associated network coefficient using a novel class of global-local network shrinkage priors. We perform a theoretical analysis of a member of this class of priors (which we call the Network Lasso Prior) and show asymptotically correct classification of networks even when the number of network edges grows faster than the sample size. Our approach is implemented using an efficient Markov Chain Monte Carlo algorithm, and empirically evaluated through simulation studies and the analysis of a real brain connectome dataset.

#### Simon Mak

Cost-efficient Bayesian inference with expensive scientific experiments

Data science is at a critical and defining crossroad. On one hand, with remarkable breakthroughs in mathematical modeling and experimental technology, reliable data is now obtainable for complex scientific systems that were previously unobservable. On the other hand, the generation of such high-fidelity data requires costly experiments and/or simulations, which greatly limits the amount of data for scientific discovery – a critical bottleneck in modern scientific studies. My research aims to bridge this gap by developing Bayesian methods (supported by theory & algorithms) that embed scientific knowledge as prior information. This fusing of “data” and “science” within a Bayesian framework allows for principled integration of scientific prior knowledge, which then enables more accurate and precise scientific findings given a limited experimental cost budget. I will present a suite of recent Bayesian methods developed by our group which tackle this integration, motivated by ongoing collaborations in high-energy physics, aerospace engineering and bioengineering.

#### Akihiko Nishimura

“Large n & large p” Bayesian sparse regression for analyzing a network of observational health databases

Growing availability of large observational health databases presents opportunities to generate clinical evidence to better tailor treatments to individual patients. Even with large cohort sizes found in these databases, however, low incidence of major health outcomes makes it difficult to identify sources of treatment effect heterogeneity among a large number of clinical covariates. Sparse regression provides a potential solution. The Bayesian approach is particularly attractive in our setting, where the signals are weak and heterogeneity across databases are substantial. Applications of Bayesian sparse regression to large-scale data sets, however, have been hampered by the lack of scalable computational techniques. We deploy advanced numerical linear algebra techniques to tackle the critical bottleneck in computing posteriors under Bayesian sparse regression. We apply our algorithm to a large-scale observational study with n = 72,489 patients and p = 22,175 clinical covariates, designed to assess the relative risk of adverse events from two alternative blood anti-coagulants. Our algorithm demonstrates an order of magnitude speed-up in posterior inference, in our case cutting the computation time from two weeks to less than a day. This computational innovation opens up opportunities to carry out more ambitious hierarchical analysis of a network of healthcare databases. A software package is available, with more features under active development.