Publications, preprints & working papers

Towards Adaptive Self-Normalized Importance Samplers; Branchini, Nicola and Elvira, Víctor. In: 2025 IEEE Statistical Signal Processing Workshop

The TLDR; To estimate µ = E_p[f(θ)] when p’s normalizing constant is unknown, instead of doing MCMC on p(θ) or even p(θ)|f(θ)|, or learning a parametric q(θ), we try MCMC directly on p(θ)|f(θ)- µ|, which is the asymptotic-variance minimizing proposal. Note: we cannot do MCMC straightforwardly, as p(θ)|f(θ)- µ| cannot be evaluated - it contains µ, the quantity of interest ! So, we propose a simple iterative scheme that works: initial estimate µ₀ ; run a chain on the approximation p(θ)| f(θ)- µ₀ |; estimate µ again with SNIS, and keep iterating. I’m quite excited about extending this work.
Scalable Expectation Estimation with Subtractive Mixture Models (preprint); Zellinger, Lena^♦ and Branchini, Nicola^♦ and Elvira, Víctor, and Vergari, Antonio. (♦equal contribution.)

Importance sampling with mixture models is all over the place (even where you don’t see it). Subtractive mixture models - MMs with negative weights - are super cool and can model complex distributions more efficiently. It’d be great to use them for IS, but sampling from them is a pain. We propose an estimator that exploits that a SMM is a difference of two regular MMs, so that we can do IS and scale in higher dimension (note: sampling from an SMM requires costly autoregressive inverse transform sampling).
The role of tail dependence in estimating posterior expectations; Branchini, Nicola and Elvira, Víctor. In NeurIPS 2024 Workshop on Bayesian Decision-making and Uncertainty.
Generalized self-normalized importance sampling (preprint); Branchini, Nicola and Elvira, Víctor. Video from SMC 2024; Xi’an’s comments in his blog.

The self-normalized IS estimator is widely used to estimate expectations with intractable normalizing constants, for example, in Bayesian leave-one-out cross validation or likelihood free inference. In this paper, we propose a framework to understand when SNIS works and when it does not, with a generalization that allows us to overcome its limitations, with connections to continuous optimal transport. See paper abstract for more info.

Adaptive importance sampling for heavy-tailed distributions via α-divergence minimization; Guilmeau, Thomas^♦ and Branchini, Nicola^♦ and Chouzenoux, Emilie and Elvira, Víctor. In 27th Conference on Artificial Intelligence and Statistics (AISTATS), Proceedings of Machine Learning Research, 2024. (♦equal contribution.)

Many adaptive IS (and some VI) methods are based on matching the moments of a target distributions. When the target has heavy tails, these moments can be undefined or their estimation can have high variance. We propose an AIS method that overcomes this by matching the moments of a (lighter tailed) modified target, which is exponentiated to a power alpha. Despite this, the procedure actually minimizes the alpha-divergence between the proposal and the true target. Note: many previous works propose AIS methods with heavy-tailed proposals, but not necessarily suitable for heavy-tailed targets.

Variational Resampling; Kviman, Oskar and Branchini, Nicola and Elvira, Víctor and Lagergren, Jens. In 27th Conference on Artificial Intelligence and Statistics (AISTATS), Proceedings of Machine Learning Research, 2024.

A very neat idea stemming from Oskar’s Master’s thesis (he’s impressive, isn’t he ?); when we resample in PFs, we usually would like the resulting equally-weighted distribution of the resampled particles to be ``close” in some sense to the distribution before resampling (which was unequally-weighted, in general). Usually, resampling schemes enforce this by saying that the number of times a particle gets replicated is, on average, equal to its weight in the pre-resampling distribution. What we do here instead is to optimize the number of times a particle gets replicated so as to minimize a divergence between the post-resampling distribution and the pre-resampling distribution directly ! With a very smart algorithm again entirely due to Oskar.

Causal optimal transport of abstractions; Felekis, Yorgos and Zennaro, Fabio and Branchini, Nicola and Damoulas, Theodoros. In 3rd Conference on Causal Learning and Reasoning (CLeaR 2024).

The task of causal abstraction involves finding a mapping (a measurable transport map) between structural causal models (SCMs) and their corresponding “abstracted versions”, which can be simplified or coarser SCMs (fewer variables or different functional relationships). We consider the problem of learning causal abstractions from data. We propose a framework that does so without specifying parametric relationships for the SCM functions. The method involves a multimarginal OT problem (as many marginals as there are considered interventions (not really, but roughly to get the idea)) with soft constraints and a cost function econding knowledge of the underlying causal DAGs. Nicely, the soft constraints have a do-calculus interpretation.
An adaptive mixture view of particle filters; Branchini, Nicola and Elvira, Víctor. FoDS (Foundations of Data Science).

Coming !

Causal Entropy Optimization; Branchini, Nicola and Aglietti, Virginia and Dhir, Neil and Damoulas, Theodoros. In 26th Conference on Artificial Intelligence and Statistics (AISTATS), Proceedings of Machine Learning Research, 2023.

In this paper, we studied the problem of “causal global optimization”: finding the optimum intervention that is the minimizer of several causal effects (that is, we consider possibly intervening on many different subset of variables). When the underlying causal graph is not known, the first step is studying what happens if we assume any one of the possible graphs is the true one, and run “CBO”- causal Bayesian optimization - as normal. We studied what the effect of this kind of incorrect causal assumption is for optimization purposes. Further, since in many cases the underlying function can be optimized efficiently even if the graph is not fully known, we designed an acquisition function that automatically trades-off optimization of the effect and structure learning.

Optimized Auxiliary Particle Filters: adapting mixture proposals via convex optimization, Branchini, Nicola and Elvira, Víctor. In 37th Conference on Uncertainty in Artificial Intelligence (UAI), Proceedings of Machine Learning Research, 2021.

In this paper we wanted to improve on the Auxiliary Particle Filter (APF), which is thought for estimating the likelihood in sequential latent variable models with very informative observations. This algorithm however still has severe drawbacks; among some, the resampling weights are chosen independently, i.e. each particle chooses its own without “knowing” what the others are doing. We devise a new way to optimize these resampling weights by viewing them as mixture weights of an importance sampling mixture proposal. It turns out that choosing mixture weights in order to minimize the resulting empirical variance of the importance weights leads to a convex optimization problem.

Video and slides from UAI

$\mathbb{P}$robably Approximately Wrong

An infrequent blog, by Nicola Branchini

Publications, preprints & working papers