• Volodymyr Turchenko
  • Eric Chalmers
  • Artur Luczak




Deep convolutional auto-encoder, Machine learning, Neural networks, Dimensionality reduction, Unsupervised clustering.


This paper presents the development of several models of a deep convolutional auto-encoder in the Caffe deep learning framework and their experimental evaluation on the example of MNIST dataset. We have created five models of a convolutional auto-encoder which differ architecturally by the presence or absence of pooling and unpooling layers in the auto-encoder’s encoder and decoder parts. Our results show that the developed models provide very good results in dimensionality reduction and unsupervised clustering tasks, and small classification errors when we used the learned internal code as an input of a supervised linear classifier and multi-layer perceptron. The best results were provided by a model where the encoder part contains convolutional and pooling layers, followed by an analogous decoder part with deconvolution and unpooling layers without the use of switch variables in the decoder part. The paper also discusses practical details of the creation of a deep convolutional auto-encoder in the very popular Caffe deep learning framework. We believe that our approach and results presented in this paper could help other researchers to build efficient deep neural network architectures in the future.


D.E. Rumelhart, G.E. Hinton, R.J. Williams, “Learning representations by back-propagating errors,” Nature, vol. 323, pp. 533-536, 1986.

Y. LeCun, Modeles connexionistes de l’apprentissage, Ph.D. thesis, Universite de Paris VI, 1987. (in French)

H. Bourland, Y. Kamp, “Auto-association by multilayer perceptrons and singular value decomposition,” Biological Cybernetics, vol. 59, pp. 291-294, 1988.

P. Baldi, K. Hornik, “Neural networks and principal component analysis: Learning from examples without local minima,” Neural Networks, vol. 2, pp. 53-58, 1989.

G.E. Hinton, R.R. Salakhutdinov, “Reducing the dimensionality of data with neural networks,” Science, vol. 313, pp. 504-507, 2006.

Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, issue 11, pp. 2278-2324, 1998.

A. Krizhevsky, Cuda-convnet2, High-performance C++/CUDA implementation of convolutional neural networks, 2014, [Online]. Available:https://github.com/akrizhevsky/cuda-convnet2 (accessed 24.12.2018).

Theano Development Team, Theano: A Python framework for fast computation of mathematical expressions, arXiv:1605.02688, 2016, 19 p.

S. Dieleman, J. Schlüter, C. Raffel, E. Olson, S.K. Sønderby, D. Nouri, J. Degrave, et al, Lasagne: First release, (Version v0.1), 2015, [Online]. Available: Zenodo. http://doi.org/10.5281/zenodo.27878 (accessed 24.12.2018).

Keras: Deep learning library for Theano and TensorFlow, 2015, [Online]. Available: https://keras.io/ (accessed 24.12.2018).

R. Collobert, K. Kavukcuoglu, C. Farabet, Torch7: A Matlab-like environment for machine learning, in: J. Shawe-Taylor, R.S. Zemel, P.L. Bartlett, F. Pereira, K.Q. Weinberger (Eds.), Advances in Neural Information Processing Systems 24, NIPS Foundation Inc., Granada, 2011.

Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, T. Darrell, Caffe: Convolutional architecture for fast feature embedding, arXiv:1408.5093, 2014, 4 p.

M. Abadi et al., TensorFlow: Large-scale machine learning on heterogeneous systems, 2015, [Online].Available: https://tensorflow.org (accessed 24.12.2018).

V. Turchenko, A. Luczak, “Creation of a deep convolutional auto-encoder in Caffe,” Proceedings of the 9th IEEE International Conference on Intelligent Data Acquisition and Advanced Computing Systems (IDAACS’2017), Bucharest, Romania, 2017, vol. 1, pp. 651-659.

M. Ranzato, F.J. Huang, Y.-L. Boureau, Y. LeCun, “Unsupervised learning of invariant feature hierarchies with applications to object recognition”, Proceedings of the 2007 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Minneapolis, MN, 2007, pp. 1-8.

P. Vincent, H. Larochelle, Y. Bengio, P.A. Manzagol, “Extracting and composing robust features with denoising autoencoders,” Proceedings of the 2008 25th International Conference on Machine Learning (ICML), Helsinki, Finland, 2008, pp. 1096-1103.

I. Goodfellow, Y. Bengio, A. Courville, Deep Learning, MIT Press, 2016, 800 p.

G.E. Hinton, A. Krizhevsky, S.D. Wang, “Transforming auto-encoders,” Lecture Notes in Computer Science, vol. 6791, pp. 44-51, 2011.

S. Rifai, P. Vincent, X. Muller, X. Glorot, Y. Bengio, “Contractive auto-

encoders: Explicit invariance during feature extraction,” Proceedings of the 28th International Conference on Machine Learning (ICML), Bellevue, WA, 2011, pp. 833-840.

A. Makhzani, B. Frey, “K-Sparse autoencoders”, Proceedings of the International Conference on Learning Representations (ICLR), arXiv:1312.5663, 2013, 9 p.

D.P. Kingma, M. Welling, “Auto-encoding variational Bayes,” Proceedings of the International Conference on Learning Representations (ICLR), arXiv:1312.6114, 2013, 14 p.

Y. Burda, R. Grosse, R. Salakhutdinov, Importance weighted autoencoders, arXiv:1509.00519, 2015, 14 p.

A. Makhzani, J. Shlens, N. Jaitly, I. Goodfellow, B. Frey, “Adversarial autoencoders,” Proceedings of the International Conference on Learning Representations (ICLR), arXiv:1511.05644, 2015, 16 p.

H. Lee, R. Grosse, R. Ranganath, A.Y. Ng, “Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations,” Proceedings of the 26th Annual International Conference on Machine Learning (ICML), New York, NY, 2009, pp. 609-616.

M. Norouzi, M. Ranjbar, G. Mori, “Stacks of convolutional restricted Boltzmann machines for shift-invariant feature learning,” Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Miami, FL, 2009, pp. 2735-2742.

J. Masci, U. Meier, D. Ciresan, J. Schmidhuber, “Stacked convolutional auto-encoders for hierarchical feature extraction,” Lecture Notes in Computer Science, vol. 6791, pp. 52-59, 2011.

J. Zhao, M. Mathieu, R. Goroshin, Y. LeCun, “Stacked what-where auto-encoders,” Proceedings of the International Conference on Learning Representations (ICLR), arXiv:1506.02351, 2016, 12 p.

M.D. Zeiler, D. Krishnan, G.W. Taylor, R. Fergus, “Deconvolutional networks,” Proceedings of the 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, 2010, pp. 2528-2535.

M.D. Zeiler, G.W. Taylor, R. Fergus, “Adaptive deconvolutional networks for mid and high level feature learning,” Proceedings of the 2011 IEEE International Conference on Computer Vision (ICCV), Barcelona, Spain, 2011, pp. 2018-2025.

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, R. Salakhutdinov, “Dropout: A simple way to prevent neural networks from overfitting,” Journal of Machine Learning Research, vol. 15, pp. 1929-1958, 2014.

G. Hinton, Reddit Machine Learning: Ask me anything, 2014, [Online]. Available: https://www.reddit.com/r/MachineLearning/comments/2lmo0l/ama_geoffrey_hinton (accessed 24.12.2018).

D. Scherer, A. Muller, S. Behnke, “Evaluation of pooling operations in convolutional architectures for object recognition,” Lecture Notes in Computer Science, vol. 6354, pp. 92-101, 2010.

J.T. Springenberg, A. Dosovitskiy, T. Brox, M. Riedmiller, “Striving for simplicity: the all convolutional net,” Proceedings of the International Conference on Learning Representations (ICLR), arXiv:1412.6806, 2014, 14 p.

A. Dosovitskiy, P. Fischer, J. Springenberg, M. Riedmiller, T. Brox, “Discriminative unsupervised feature learning with exemplar convolutional neural networks,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 38, issue 9, pp. 1734-1747, 2016.

R. B. Palm, DeepLearnToolbox: Matlab/Octave toolbox for deep learning, 2012, [Online]. Available: https://github.com/rasmusbergpalm/DeepLearnToolbox (accessed 24.12.2018).

D. Stansbury, Medal: Matlab environment for deep architecture learning, 2012, [Online]. Available: https://github.com/dustinstansbury/medal (accessed 24.12.2018).

M. Swarbrick Jones, Convolutional autoencoders in python/theano/lasagna, 2015, [Online]. Available: https://swarbrickjones.wordpress.com/2015/04/29/convolutional-autoencoders-in-pythontheanolasagne/ (accessed 24.12.2018).

J. Rebetez, Convolutional deep autoencoder on MNIST, 2015, [Online]. Available: https://github.com/julienr/ipynb_playground/blob/master/keras/convmnist/keras_conv_autoencoder_mnist.ipynb (accessed 24.12.2018).

C. Farabet, Torch7 auto-encoder demo, 2014, [Online]. Available: https://github.com/torch/demos/blob/master/train-autoencoder/train-autoencoder.lua (accessed 24.12.2018).

S. Khallaghi, Training autoencoders on ImageNet using Torch7, 2016, [Online]. Available: http://siavashk.github.io/2016/02/22/autoencoder-imagenet/ (accessed 24.12.2018).

Nervana Systems/Neon, Convolutional autoencoder example network for MNIST data set, 2015, [Online]. Available: https://github.com/NervanaSystems/neon/blob/master/examples/conv_autoencoder.py (accessed 24.12.2018).

H. Noh, Modified version of Caffe which supports DeconvNet and DecoupledNet, 2015, [Online]. Available: https://github.com/HyeonwooNoh/caffe (accessed 24.12.2018).

H. Noh, S. Hong, B. Han, “Learning deconvolution network for semantic segmentation,” Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 2015, pp. 1520-1528.

P.T. De Boer, D.P. Kroese, S. Mannor, R.Y. Rubinstein, “A tutorial on the cross-entropy method”, Annals of Operations Research, vol. 134, issue 1, pp. 19–67, 2005.

M.D. Zeiler, R. Fergus, “Visualizing and understanding convolutional networks,” Lecture Notes in Computer Science, vol. 8689, pp. 818-833, 2014.

Caffe deep auto-encoder example, 2015, [Online]. Available: https://github.com/BVLC/caffe/tree/master/examples/mnist (accessed 24.12.2018).

B. Chen, G. Polatkan, G. Sapiro, D. Blei, D. Dunson, L. Carin, “Deep learning with hierarchical convolutional factor analysis,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, issue 8, pp. 1887-1901, 2013.

Y. LeCun, C. Cortes, C.J.C. Burges, The MNIST database of handwritten digits, 1998, [Online]. Available: http://yann.lecun.com/exdb/mnist/ (accessed 24.12.2018).

Siamese Network Training with Caffe, 2015, [Online]. Available: http://caffe.berkeleyvision.org/gathered/examples/siamese.html (accessed 24.12.2018).

R. Hadsell, S. Chopra, Y. LeCun, “Dimensionality reduction by learning an invariant mapping,” Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), New York, 2006, pp. 1735-1742.

E. Shelhamer, Refactor convolution layer and add deconvolution layer #1615, 2015, [Online]. Available:https://github.com/BVLC/caffe/pull/1615 (accessed 24.12.2018).

M. Sokolova, G. Lapalme, “A systematic analysis of performance measures for classification tasks,” Information Processing and Management, vol. 45, issue 4, pp. 427-437, 2009.

Cuda Zone, 2007, [Online]. Available: https://developer.nvidia.com/cuda-zone (accessed 24.12.2018).

NVIDIA cuDNN: GPU accelerated deep learning, 2014, [Online]. Available: https://developer.nvidia.com/cudnn (accessed 24.12.2018).

L.J.P. van der Maaten, G. Hinton, “Visualizing data using t-SNE,” Journal of Machine Learning Research, vol. 9, pp. 2579-2605, 2008.

G.E. Hinton, S.T. Roweis, “Stochastic neighbor embedding,” in Advances in Neural Information Processing Systems (NIPS), Cambridge, MA, 2002, vol. 15, pp. 833–840.

V. Mnih, K. Kavukcuoglu, D. Silver et al, “Human-level control through deep reinforcement learning,” Nature, vol. 518, pp. 529-533, 2015.

V. Turchenko, Convolutional Auto-Encoder in Caffe, but still without pooling-unpooling layers, 2015, [Online]. Available: https://groups.google.com/forum/#!topic/caffe-users/GhrCtONcRxY (accessed 24.12.2018).

A. Luczak, Lethbridge Brain Dynamics, 2009, [Online]. Available: http://lethbridgebraindynamics.com/artur-luczak/ (accessed 24.12.2018).

N. Hurley, S. Rickard, “Comparing measures of sparsity,” IEEE Transactions on Information Theory, vol. 55, issue 10, pp. 4723-4741, 2009.

S. Geman, E. Bienenstock, R. Doursat, “Neural networks and the bias/variance dilemma,” Neural Computation, vol. 4, pp. 1-58, 1992.

Rectified linear units in autoencoder, 2013, [Online]. Available: https://groups.google.com/forum/#!topic/pylearn-dev/iWqctW9nkAg (accessed 24.12.2018).

A. Luczak, P. Barthó, K.D. Harris, “Spontaneous events outline the realm of possible sensory responses in neocortical populations,” Neuron, vol. 62, pp. 413-425, 2009.

A. Luczak, P. Barthó, S.L. Marguet, G. Buzsáki, K.D. Harris, “Sequential structure of neocortical spontaneous activity in vivo,” Proceedings of the National Academy of Sciences, vol. 104, pp. 347-352, 2007.

A. Luczak, J.N. MacLean, “Default activity patterns at the neocortical microcircuit level,” Frontiers in Integrative Neuroscience, vol. 6, Article 30, pp. 1-6, 2012.

E.J. Bermudez Contreras, A.G.P. Schjetnan, A. Muhammad, P. Bartho, B.L. McNaughton, B. Kolb, A.J. Gruber, A. Luczak, “Formation and reverberation of sequential neural activity patterns evoked by sensory stimulation are enhanced during cortical desynchronization,” Neuron, vol. 79, pp. 555-566, 2013.

A. Luczak, “Packets of sequential neural activity in sensory cortex,” in Analysis and Modeling of Coordinated Multi-neuronal Activity, Springer, New York, 2015, pp. 163-182.

A.R. Neumann, R. Raedt, H.W. Steenland, M. Sprengers, K. Bzymek, Z. Navratilova, L. Mesina, J. Xie, V. Lapointe, F. Kloosterman, K. Vonck, P.A. Boon, I. Soltesz, B.L. McNaughton, A. Luczak, “Involvement of fast-spiking cells in ictal sequences during spontaneous seizures in rats with chronic temporal lobe epilepsy,” Brain, vol. 140, pp. 2355-2369, 2017.

A. Luczak, N.S. Narayanan, “Spectral representation – analyzing single-unit activity in extracellularly recorded neuronal data without spike sorting,” Journal of Neuroscience Methods, vol. 144, pp. 53-61, 2005.

A. Luczak, T.A. Hackett, Y. Kajikawa, and M. Laubach, “Multivariate receptive field mapping in marmoset auditory cortex,” Journal of Neuroscience Methods, vol. 136, pp. 77-85, 2004.




How to Cite

Turchenko, V., Chalmers, E., & Luczak, A. (2019). A DEEP CONVOLUTIONAL AUTO-ENCODER WITH POOLING – UNPOOLING LAYERS IN CAFFE. International Journal of Computing, 18(1), 8-31. https://doi.org/10.47839/ijc.18.1.1270