Attention Augmented Differentiable Forest for Tabular Data
The author has developed a new “differentiable forest”-type neural network framework for predictions on tabular data that has some similarity to the recently suggested NODE architecture and employs squeeze-and-excitation “tree attention blocks” (TABs) to show performance superior to gradient boosted decision trees (e.g. XGBoost, LightGBM, Catboost) on a number of benchmarks.
MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks
The authors used a version of the recently suggested MEAL technique (which involves knowledge distillation from multiple large teacher networks into a smaller student network via adversarial learning) to increase the top-1 accuracy of ResNet-50 on ImageNet with 224×224 input size to 80.67% without external training data or network architecture modifications.
An Image is Worth 16×16 Words: Transformers For Image Recognition At Scale
This paper develops a novel way of using Transformer neural attention models for visual recognition tasks.
Generative Language Modeling for Automated Theorem Proving
The authors use GPT-3-like language models to develop GPT-f, a tool that automatically generates proofs of mathematical theorems.
Big Bird: Transformers for Longer Sequences
In this paper, the authors present a Transformer attention model with linear complexity that is mathematically proven to be Turing complete (and thus as powerful as the original quadratic attention model) and achieves new state-of-the-art results on many NLP tasks involving long sequences (e.g. question answering and summarization), as well as genomics data.