AutoDropout: Learning Dropout Patterns to Regularize Deep Networks

Review of paper by Hieu Pham1, 2 and Quoc V. Le1, 1Google Research and 2Carnegie Mellon University, 2021.

As an improvement over existing Dropout regularization variants for deep neural networks (e.g. regular Dropout, SpatialDropout, DropBlock) that have a randomized structure with certain fixed parameters, the authors develop a reinforcement learning approach for finding better Dropout patterns for various network architectures.


Every Model Learned by Gradient Descent Is Approximately a Kernel Machine

Review of paper by Pedro Domingos, University of Washington, 2020

In this paper, the author shows that neural networks trained using first-order gradient descent with a small learning rate can be represented as similarity kernels and that they memorize the training points and subsequently use this information to make predictions the same way as SVMs and other kernel methods. This insight should lead to a better general understanding of how deep neural networks operate and, hopefully, will help improve future algorithms.


Scaling *down* Deep Learning

Review of paper by Sam Greydanus, Oregon State University and the ML Collective, 2020

Inspired by the widespread use of the standard MNIST as a playground dataset for deep learning, the author has developed a new MNIST-1D dataset that is even smaller (just a one-dimensional sequence of 40 numbers for each sample) but is harder to predict on, demonstrates a more obvious difference in performance across network architectures, and is more conducive to exploring various interesting topics such as, for example, analyzing “lottery tickets” and the double descent phenomenon.


Gradient Starvation: A Learning Proclivity in Neural Networks

Review of paper by Mohammad Pezeshki1,2, Sekou-Oumar Kaba1,3, Yoshua Bengio1,2, et al, 1Mila, 2 Université de Montréal, 3McGill University, 2020

In this paper, the authors examine in detail the phenomenon of gradient starvation, which was originally introduced by the same research group in 2018, for neural networks trained with the common cross-entropy loss. Gradient starvation occurs when the presence of easy-to-learn features in a dataset prevents the learning of other equally informative features, which may lead to a lack of robustness in the trained models that rely only on these few features. The authors propose a new Spectral Decoupling regularization method to combat this problem.


AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients

Review of paper by Juntang Zhuang1, Tommy Tang2, Yifan Ding3, et al, 1Yale University, 2University of Illinois at Urbana-Champaign, and 3University of Central Florida, 2020

This paper develops a new neural network optimizer that aims to combine the fast convergence and stability of adaptive methods such as Adam and the generalization power of SGD.


Supervised Contrastive Learning

Review of paper by Prannay Khosla, Piotr Teterwak, Chen Wang et al, Google Research, 2020

The authors used contrastive loss, which has recently been shown to be very effective at learning deep neural network representations in the self-supervised setting, for supervised learning, and achieved better results than those obtained with the cross-entropy loss for ResNet-50 and ResNet-200.


No more pages to load