A great review of many state-of-the-art tricks that can be used to improve the performance of a deep convolutional network (ResNet), combined with actual implementation details, source code, and performance results. A must-read for all Kaggle competitors or anyone who wants to achieve maximum performance on computer vision tasks.
What can we learn from this paper?
This is a very practical paper that implements many optimizations to the standard ResNet architecture and examines the results. If you want to improve your understanding of convolutional networks, I recommend cloning the GitHub repository of the paper from https://github.com/clovaai/assembled-cnn and going through the code along with reading the paper to get the best idea of how each part can be implemented and what kind of improvements can be obtained.
It is shown that a well-tuned ResNet can exceed the performance of the more recent EfficientNet architecture. One can wonder if the latter could be improved as well with some of these or other tricks?
Only read this paper if you are interested in going deep into the details of optimizing deep convolutional network performance. Otherwise, it is probably sufficient to know that any of the discussed pre-trained networks can deliver excellent results on many real-life tasks.
Prerequisites (to understand the paper, what does one need to be familiar with?)
- Convolutional neural networks (good level of understanding)
- Familiarity with recent architectures will be helpful
To achieve maximum performance from a deep convolutional neural network on various image recognition tasks.
The optimizations that the authors used to improve the performance of standard ResNet architectures fall into two groups. The first group consists of small tweaks to the architecture of the network without changing its general structure. The second group includes various regularization techniques that facilitate training and prevent overfitting by using training data augmentation methods, or by limiting the network’s complexity.
Since the goal of this review is not to delve deep into each of the specific techniques (this, I believe, can be best accomplished by reading the original paper and following its references, as well as examining the authors’ code), here I will just list the methods that were used.
1) Network tweaks:
In addition, the hyperparameters and preprocessing techniques were chosen for optimal performance based on He et al, 2019 paper.
To evaluate the improvement in performance, the authors used the classification task on the standard 2012 ImageNet dataset with three metrics:
- Top 1 Accuracy
- Mean Corruption Error (mCE), which is the ratio of top 1 classification error on dataset images corrupted by noise over the same error of AlexNet, averaged over 5 different noise levels and all images in the dataset.
- Inference Throughput (number of classified images per second)
As a result of optimizations, the top 1 accuracy on the ImageNet dataset was increased from 76.87% for the baseline ResNet-50 to 82.78% (84.19% when the larger ResNet-152 is used instead as the backbone). This is similar to the accuracy achievable with EfficientNet architectures but with much lower corruption error and higher throughput. The mCE was improved from the baseline 75.55% to 48.89% (43.27% with ResNet-152, 59.4% for EfficientNet B7), while the throughput only decreased from 536 to 312 images/sec (143 with ResNet-152, just 16 for EfficientNet B7). Using an ablation study, the authors show how much each of the proposed tweaks changes the performance of the network.
The authors also evaluated the performance of the optimized networks on various other datasets for classification and image retrieval tasks. In all cases, the performance of the models was excellent and generally exceeded that of state-of-the-art configurations.