AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients

Review of paper by Juntang Zhuang1, Tommy Tang2, Yifan Ding3, et al, 1Yale University, 2University of Illinois at Urbana-Champaign, and 3University of Central Florida, 2020

This paper develops a new neural network optimizer that aims to combine the fast convergence and stability of adaptive methods such as Adam and the generalization power of SGD.

CONTINUE READING >

Attention Augmented Differentiable Forest for Tabular Data

Review of paper by Yingshi Chen, Xiamen University, 2020

The author has developed a new “differentiable forest”-type neural network framework for predictions on tabular data that has some similarity to the recently suggested NODE architecture and employs squeeze-and-excitation “tree attention blocks” (TABs) to show performance superior to gradient boosted decision trees (e.g. XGBoost, LightGBM, Catboost) on a number of benchmarks.

CONTINUE READING >

MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks

Review of paper by Zhiqiang Shen and Marios Savvides, Carnegie Mellon University, 2020

The authors used a version of the recently suggested MEAL technique (which involves knowledge distillation from multiple large teacher networks into a smaller student network via adversarial learning) to increase the top-1 accuracy of ResNet-50 on ImageNet with 224×224 input size to 80.67% without external training data or network architecture modifications.

CONTINUE READING >

Big Bird: Transformers for Longer Sequences

Review of paper by Manzil Zaheer, Guru Guruganesh, Avinava Dubey et al, Google Research, 2020

In this paper, the authors present a Transformer attention model with linear complexity that is mathematically proven to be Turing complete (and thus as powerful as the original quadratic attention model) and achieves new state-of-the-art results on many NLP tasks involving long sequences (e.g. question answering and summarization), as well as genomics data.

CONTINUE READING >

No more pages to load