MEAL V2: Boosting Vanilla ResNet-50 to 80%+ Top-1 Accuracy on ImageNet without Tricks

By Zhiqiang Shen and Marios Savvides (Carnegie Mellon University), 2020

The authors used a version of the recently suggested MEAL technique (which involves knowledge distillation from multiple large teacher networks into a smaller student network via adversarial learning) to increase the top-1 accuracy of ResNet-50 on ImageNet with 224×224 input size to 80.67% without external training data or network architecture modifications.

CONTINUE READING >

Big Bird: Transformers for Longer Sequences

By Manzil Zaheer, Guru Guruganesh, Avinava Dubey et al (Google Research), 2020

In this paper, the authors present a Transformer attention model with linear complexity that is mathematically proven to be Turing complete (and thus as powerful as the original quadratic attention model) and achieves new state-of-the-art results on many NLP tasks involving long sequences (e.g. question answering and summarization), as well as genomics data.

CONTINUE READING >

Linformer: Self-Attention with Linear Complexity

By Sinong Wang, Belinda Z. Li, Madian Khabsa et al (Facebook AI Research), 2020

This paper suggests an approximate way of calculating self-attention in Transformer architectures that has linear space and time complexity in terms of the sequence length, with the resulting performance on benchmark datasets similar to that of the RoBERTa model based on the original Transformers with much less efficient quadratic attention complexity.

CONTINUE READING >

No more pages to load