In this paper, the authors examine in detail the phenomenon of gradient starvation, which was originally introduced by the same research group in 2018, for neural networks trained with the common cross-entropy loss. Gradient starvation occurs when the presence of easy-to-learn features in a dataset prevents the learning of other equally informative features, which may lead to a lack of robustness in the trained models that rely only on these few features. The authors propose a new Spectral Decoupling regularization method to combat this problem.
The authors used contrastive loss, which has recently been shown to be very effective at learning deep neural network representations in the self-supervised setting, for supervised learning, and achieved better results than those obtained with the cross-entropy loss for ResNet-50 and ResNet-200.
This paper suggests a new algorithm for training deep neural networks that can be run efficiently without a GPU.
This paper examines the theoretical reasons for using batch normalization in deep residual networks and suggests a simpler alternative solution.
This paper presents a new neural network activation function and shows with a number of examples that it often increases the accuracy of deep networks.