Yoshua Bengio, Aaron Courville, Pascal Vincent http://arxiv.org/abs/1206.5538 This paper covers much of the “nitty gritty” details. Weight initialization, optimization schemes, tradeoffs, and implementation. Fantastic! Learning Features from Tiny Images, Alex Krizhevsky http://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf The greedy layerwise training of this thesis is out of date now, but the rest of the insights into image statistics and intuition are very, very good. ImageNet Classification with Deep Convolutional Neural Networks, Alex Krizhevsky, Ilya Sutskever, Geoff Hinton http://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks This paper basically lays out the “modern” approach to most neural networks for images (also audio, and some NLP). OverFeat is basically an implementation of this architecture, with a number of custom tweaks to get better performance. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks, P. Sermanet, D. Eigen, X. Zhang, M. Mathieu, R. Fergus, Y. LeCun http://arxiv.org/abs/1312.6229 I like this paper (and associated code: https://github.com/sermanet/OverFeat) a whole lot. It inspired sklearn-theano!