------------------------------------------------
Speaker:
Matteo Sesia (University of Southern California)
Title: Testing for outliers with conformal p-values
Abstract:
This talk discusses the construction of provably valid frequentist p-values for nonparametric outlier detection, taking a multiple-testing perspective. The goal is to test whether new independent samples belong to the same distribution as a reference data set or are outliers. We study a solution based on conformal inference, a broadly applicable framework which yields p-values that are marginally valid in finite samples but are mutually dependent for different test points. We prove these p-values are positively dependent and enable exact false discovery rate control, although in a relatively weak marginal sense. We then introduce a new method to compute p-values that are both valid conditionally on the training data and independent of each other for different test points; this paves the way to stronger type-I error guarantees. Finally, we discuss how to further boost power by leveraging a separate data set of known outliers with an approach inspired by weighted hypothesis testing. The practical relevance of our results is demonstrated by numerical experiments on real and simulated data.