Dear Colleagues,
We would like to invite you to the following SPASS seminars,
Edge of Stochastic Stability: SGD does not train neural
networks as you expect it
by Pierfrancesco Beneventano (MIT)
on Tuesday 13.1.2026 at 14:00 CET in Saletta Riunioni,
Dipartimento di Matematica, UNIPI.
A link for online participants will be shared on the website
https://sites.google.com/unipi.it/spass
Best,
M. Romito on behalf of the organizers
-------------------
Abstract: Recent findings demonstrate that when training neural
networks using full-batch (deterministic) gradient descent with
step size η, the largest eigenvalue λ of the Hessian consistently
stabilizes around 2/η. These results are surprising and carry
significant implications for convergence and generalization. This,
however, is not the case for mini-batch optimization algorithms,
limiting the broader applicability of the consequences of these
findings. We show that mini-batch Stochastic Gradient Descent
(SGD) trains in a different regime, which we term Edge of
Stochastic Stability (EoSS). In this regime, what stabilizes at
2/η is Batch Sharpness: the expected directional curvature of
mini-batch Hessians along their corresponding stochastic
gradients. As a consequence, λ---which is generally smaller than
Batch Sharpness---is suppressed, aligning with the long-standing
empirical observation that smaller batches and larger step sizes
favor flatter minima. We further discuss implications for
mathematical modeling of SGD trajectories.