SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems
This paper suggests a new algorithm for training deep neural networks that can be run efficiently without a GPU.
This paper suggests a new algorithm for training deep neural networks that can be run efficiently without a GPU.
This paper examines the theoretical reasons for using batch normalization in deep residual networks and suggests a simpler alternative solution.
This paper presents a new neural network activation function and shows with a number of examples that it often increases the accuracy of deep networks.
This paper shows how the initialization of neural network weights affects the success of training, and that larger networks are more likely to have subnetworks within them with the “lucky” initial weight numbers.