Gradient Starvation: A Learning Proclivity in Neural Networks

By Mohammad Pezeshki1,2, Sekou-Oumar Kaba1,3, Yoshua Bengio1,2, et al, 1Mila, 2 Université de Montréal, 3McGill University, 2020

In this paper, the authors examine in detail the phenomenon of gradient starvation, which was originally introduced by the same research group in 2018, for neural networks trained with the common cross-entropy loss. Gradient starvation occurs when the presence of easy-to-learn features in a dataset prevents the learning of other equally informative features, which may lead to a lack of robustness in the trained models that rely only on these few features. The authors propose a new Spectral Decoupling regularization method to combat this problem.

CONTINUE READING >

Attention augmented differentiable forest for tabular data

By Yingshi Chen, Xiamen University, 2020

The author has developed a new “differentiable forest”-type neural network framework for predictions on tabular data that has some similarity to the recently suggested NODE architecture and employs squeeze-and-excitation “tree attention blocks” (TABs) to show performance superior to gradient boosted decision trees (e.g. XGBoost, LightGBM, Catboost) on a number of benchmarks.

CONTINUE READING >

MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks

By Zhiqiang Shen and Marios Savvides, Carnegie Mellon University, 2020

The authors used a version of the recently suggested MEAL technique (which involves knowledge distillation from multiple large teacher networks into a smaller student network via adversarial learning) to increase the top-1 accuracy of ResNet-50 on ImageNet with 224×224 input size to 80.67% without external training data or network architecture modifications.

CONTINUE READING >

No more pages to load