Learning”, in preparation for MIT Press, 2015. http://www.iro.umontreal.ca/~bengioy/dlbook/ [2] D. Rumelhart, G. Hinton, R. Williams. "Learning representations by back-propagating errors", Nature 323 (6088): 533–536, 1986. http://www.iro. umontreal.ca/~vincentp/ift3395/lectures/backprop_old.pdf [3] C. Bishop. “Mixture Density Networks”, 1994. http://research.microsoft.com/en-us/um/people/cmbishop/downloads/Bishop-NCRG-94-004.ps [4] A. Graves. “Generating Sequences With Recurrent Neural Networks”, 2013. http://arxiv.org/abs/1308.0850 [5] D. Eck, J. Schmidhuber. “Finding Temporal Structure In Music: Blues Improvisation with LSTM Recurrent Networks”. Neural Networks for Signal Processing, 2002. ftp://ftp.idsia.ch/pub/juergen/2002_ieee.pdf [6] A. Brandmaier. “ALICE: An LSTM Inspired Composition Experiment”. 2008. [7] T. Mikolov, M. Karafiat, L. Burget, J. Cernocky, S. Khudanpur. “Recurrent Neural Network Based Language Model”. Interspeech 2010. http://www.fit. vutbr.cz/research/groups/speech/publi/2010/mikolov_interspeech2010_IS100722.pdf [9] N. Boulanger-Lewandowski, Y. Bengio, P. Vincent. “Modeling Temporal Dependencies in High-Dimensional Sequences: Application To Polyphonic Music Generation and Transcription”. ICML 2012. http://www-etud.iro.umontreal.ca/~boulanni/icml2012 [10] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. "Gradient-based learning applied to document recognition." Proceedings of the IEEE, 86(11):2278- 2324, 1998. http://yann.lecun.com/exdb/mnist/ [11] D. Kingma, M. Welling. “Auto-encoding Variational Bayes”. ICLR 2014. http://arxiv.org/abs/1312.6114 [12] D. Rezende, S. Mohamed, D. Wierstra. “Stochastic Backpropagation and Approximate Inference in Deep Generative Models”. ICML 2014. http: //arxiv.org/abs/1401.4082