Attention Augmented Differentiable Forest for Tabular Data

Review of paper by Yingshi Chen, Xiamen University, 2020

The author has developed a new “differentiable forest”-type neural network framework for predictions on tabular data that has some similarity to the recently suggested NODE architecture and employs squeeze-and-excitation “tree attention blocks” (TABs) to show performance superior to gradient boosted decision trees (e.g. XGBoost, LightGBM, Catboost) on a number of benchmarks.


Big Bird: Transformers for Longer Sequences

Review of paper by Manzil Zaheer, Guru Guruganesh, Avinava Dubey et al, Google Research, 2020

In this paper, the authors present a Transformer attention model with linear complexity that is mathematically proven to be Turing complete (and thus as powerful as the original quadratic attention model) and achieves new state-of-the-art results on many NLP tasks involving long sequences (e.g. question answering and summarization), as well as genomics data.


Linformer: Self-Attention with Linear Complexity

Review of paper by Sinong Wang, Belinda Z. Li, Madian Khabsa et al, Facebook AI Research, 2020

This paper suggests an approximate way of calculating self-attention in Transformer architectures that has linear space and time complexity in terms of the sequence length, with the resulting performance on benchmark datasets similar to that of the RoBERTa model based on the original Transformers with much less efficient quadratic attention complexity.


End-to-End Object Detection with Transformers

Review of paper by Nicolas Carion, Francisco Massa, Gabriel Synnaeve et al, Facebook AI Research, 2020

This paper describes a completely automated end-to-end object detection system combining convolutional networks and Transformers. The new model shows competitive performance on par with Faster R-CNN and can be generalized to other tasks such as panoptic segmentation.


Synthesizer: Rethinking Self-Attention in Transformer Models

Review of paper by Yi Tay, Dara Bahri, Donald Metzler et al, Google Research, 2020

Contrary to the common consensus that self-attention is largely responsible for the superior performance of Transformer models on various NLP tasks, this paper suggests that substituting outputs of self-attention layers with random or simply synthesized data is sufficient to achieve similar results with better efficiency.


No more pages to load