Dear Colleagues,

We would like to invite you to the following SPASS seminars,

Edge of Stochastic Stability: SGD does not train neural networks as you expect it

by Pierfrancesco Beneventano (MIT)

on Tuesday 13.1.2026 at 14:00 CET in Saletta Riunioni, Dipartimento di Matematica, UNIPI.

A link for online participants will be shared on the website 
https://sites.google.com/unipi.it/spass

Best,

M. Romito on behalf of the organizers

-------------------

Abstract: Recent findings demonstrate that when training neural networks using full-batch (deterministic) gradient descent with step size η, the largest eigenvalue λ of the Hessian consistently stabilizes around 2/η. These results are surprising and carry significant implications for convergence and generalization. This, however, is not the case for mini-batch optimization algorithms, limiting the broader applicability of the consequences of these findings. We show that mini-batch Stochastic Gradient Descent (SGD) trains in a different regime, which we term Edge of Stochastic Stability (EoSS). In this regime, what stabilizes at 2/η is Batch Sharpness: the expected directional curvature of mini-batch Hessians along their corresponding stochastic gradients. As a consequence, λ---which is generally smaller than Batch Sharpness---is suppressed, aligning with the long-standing empirical observation that smaller batches and larger step sizes favor flatter minima. We further discuss implications for mathematical modeling of SGD trajectories.