The author has developed a new “differentiable forest”-type neural network framework for predictions on tabular data that has some similarity to the recently suggested NODE architecture and employs squeeze-and-excitation “tree attention blocks” (TABs) to show performance superior to gradient boosted decision trees (e.g. XGBoost, LightGBM, Catboost) on a number of benchmarks.
The authors used a version of the recently suggested MEAL technique (which involves knowledge distillation from multiple large teacher networks into a smaller student network via adversarial learning) to increase the top-1 accuracy of ResNet-50 on ImageNet with 224×224 input size to 80.67% without external training data or network architecture modifications.
In this paper, the authors present a Transformer attention model with linear complexity that is mathematically proven to be Turing complete (and thus as powerful as the original quadratic attention model) and achieves new state-of-the-art results on many NLP tasks involving long sequences (e.g. question answering and summarization), as well as genomics data.