Unlocking Tuning-free Generalization: Minimizing the PAC-Bayes Bound with Trainable Priors

Abstract

It is widely recognized that the generalization ability of neural networks can be greatly enhanced through carefully tuning the training procedure. The current state-of-the-art training approach involves utilizing stochastic gradient descent (SGD) or Adam optimization algorithms along with a combination of additional regularization techniques such as weight decay, dropout, or noise injection. Optimal generalization can only be achieved by tuning a multitude of hyper-parameters extensively, which can be time-consuming and necessitates the additional validation dataset. To address this issue, we present a nearly tuning-free PAC-Bayes training framework that requires no extra regularization. This framework achieves test performance comparable to that of SGD/Adam, even when the latter are optimized through a complete grid search and supplemented with additional regularization terms.

Xitong Zhang
Xitong Zhang
Ph.D. candidate in Computational Mathematics

My research interests include Learning on Graphs, AI for Science and Generalization in Machine Learning.