Learnable Extended Activation Function for Deep Neural Networks

Yevgeniy Bodyanskiy; Serhii Kostiuk

doi:10.47839/ijc.22.3.3225

Authors

Yevgeniy Bodyanskiy
Serhii Kostiuk

DOI:

https://doi.org/10.47839/ijc.22.3.3225

Keywords:

Adaptive Hybrid Activation Function, Trainable Activation Function Form, Double-Stage Parameter Turning Process, Squashing Functions, Linear Units, Deep Neural Networks

Abstract

This paper introduces Learnable Extended Activation Function (LEAF) - an adaptive activation function that combines the properties of squashing functions and rectifier units. Depending on the target architecture and data processing task, LEAF adapts its form during training to achieve lower loss values and improve the training results. While not suffering from the "vanishing gradient" effect, LEAF can directly replace SiLU, ReLU, Sigmoid, Tanh, Swish, and AHAF in feed-forward, recurrent, and many other neural network architectures. The training process for LEAF features a two-stage approach when the activation function parameters update before the synaptic weights. The experimental evaluation in the image classification task shows the superior performance of LEAF compared to the non-adaptive alternatives. Particularly, LEAF-asTanh provides 7% better classification accuracy than hyperbolic tangents on the CIFAR-10 dataset. As empirically examined, LEAF-as-SiLU and LEAF-as-Sigmoid in convolutional networks tend to "evolve" into SiLU-like forms. The proposed activation function and the corresponding training algorithm are relatively simple from the computational standpoint and easily apply to existing deep neural networks.

References

Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, May 2015. [Online]. Available: https://doi.org/10.1038/nature14539

G. Cybenko, “Approximation by superpositions of a sigmoidal function,” Mathematics of Control, Signals and Systems, vol. 2, no. 4, pp. 303–314, Dec 1989. [Online]. Available: https://doi.org/10.1007/BF02551274

S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, vol. 9, no. 8, pp. 1735–1780, Nov. 1997. [Online]. Available: https://doi.org/10.1162/neco.1997.9.8.1735

K. Cho, B. van Merrienboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, and Y. Bengio, “Learning phrase representations using RNN encoder–decoder for statistical machine translation,” in Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, 2014. [Online]. Available: https://doi.org/10.3115/v1/d14-1179

E. Parisotto, H. F. Song, J. W. Rae, R. Pascanu, C. Gulcehre, S. M. Jayakumar, M. Jaderberg, R. L. Kaufman, A. Clark, S. Noury, M. M. Botvinick, N. Heess, and R. Hadsell, “Stabilizing transformers for reinforcement learning,” in Proceedings of the 37th International Conference on Machine Learning, ser. ICML’20. JMLR.org, 2020. [Online]. Available: https://dl.acm.org/doi/abs/10.5555/3524938.3525632

N. Shazeer, “Glu variants improve transformer,” 2020. [Online]. Available: https://arxiv.org/abs/2002.05202

S. R. Dubey, S. K. Singh, and B. B. Chaudhuri, “Activation functions in deep learning: A comprehensive survey and benchmark,” Neurocomputing, vol. 503, pp. 92–108, Sep. 2022. [Online]. Available: https://doi.org/10.1016/j.neucom.2022.06.111

S. Elfwing, E. Uchibe, and K. Doya, “Sigmoid-weighted linear units for neural network function approximation in reinforcement learning,” Neural Networks, vol. 107, pp. 3–11, Nov. 2018. [Online]. Available: https://doi.org/10.1016/j.neunet.2017.12.012

P. Ramachandran, B. Zoph, and Q. V. Le, “Searching for activation functions,” in 6th International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Workshop Track Proceedings. OpenReview.net, 2018. [Online]. Available: https://openreview.net/forum?id=Hkuq2EkPf

X. Jin, C. Xu, J. Feng, Y. Wei, J. Xiong, and S. Yan, “Deep learning with sshaped rectified linear activation units,” ArXiv, vol. abs/1512.07030, 2015.

M. Tanaka, “Weighted sigmoid gate unit for an activation function of deep neural network,” Pattern Recognit. Lett., vol. 135, pp. 354–359, 2018.

Y. Bodyanskiy and S. Kostiuk, “Adaptive hybrid activation function for deep neural networks,” System research and information technologies, no. 1, pp. 87–96, Apr. 2022. [Online]. Available: https://doi.org/10.20535/srit.2308-8893.2022.1.07

J. Kruschke and J. Movellan, “Benefits of gain: speeded learning and minimal hidden layers in back-propagation networks,” IEEE Transactions on Systems, Man, and Cybernetics, vol. 21, no. 1, pp. 273–280, Jan 1991.

Z. Hu and H. Shao, “The study of neural network adaptive control systems,” Control and Decision, no. 7, pp. 361–366, 1992.

C.-T. Chen and W.-D. Chang, “A feedforward neural network with function shape autotuning,” Neural Netw., vol. 9, no. 4, p. 627–641, Jun. 1996. [Online]. Available: https://doi.org/10.1016/0893-6080(96)00006-8

E. Trentin, “Networks with trainable amplitude of activation functions,” Neural Netw., vol. 14, no. 4–5, p. 471–493, May 2001. [Online]. Available: https://doi.org/10.1016/S0893-6080(01)00028-4

F. Agostinelli, M. Hoffman, P. Sadowski, and P. Baldi, “Learning activation functions to improve deep neural networks,” 2015.

L. R. Sütfeld, F. Brieger, H. Finger, S. Füllhase, and G. Pipa, “Adaptive blending units: Trainable activation functions for deep neural networks,” 2018.

Y. V. Bodyanskiy, A. O. Deineko, I. Pliss, and V. Slepanska, “Formal neuron based on adaptive parametric rectified linear activation function and its learning,” in International Workshop on Digital Content & Smart Multimedia, 2019.

Y. Bodyanskiy and S. Kostiuk, “The two-step parameter tuning procedure for artificial neurons with adaptive activation functions,” in Neural network technologies and their applications NNTA-2022: collection of scientific papers of the 21-st International scientific conference "Neural network technologies and their applications NNTA-2022", S. Kovalevsky, Ed. Kramatorsk: DSEA, 2022, pp. 30–36, in Ukrainian.

H. Xiao, K. Rasul, and R. Vollgraf. (2017) Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms.

A. Krizhevsky, “Learning multiple layers of features from tiny images,” Tech. Rep., 2009. [Online]. Available: https://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf

Y. Lecun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no. 11, pp. 2278–2324, 1998.

F. Chollet, “Train a simple deep cnn on the cifar10 small images dataset — keras 1.2.2.” 2017, online. [Online]. Available: https://github.com/keras-team/keras/blob/1.2.2/examples/cifar10_cnn.py

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A simple way to prevent neural networks from overfitting,” Journal of Machine Learning Research, vol. 15, no. 56, pp. 1929–1958, 2014. [Online]. Available: http://jmlr.org/papers/v15/srivastava14a.html

A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala, “Pytorch: An imperative style, highperformance deep learning library,” in Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, Eds. Curran Associates, Inc., 2019, pp. 8024–8035. [Online]. Available: https://proceedings.neurips.cc/paper/2019/file/bdbca288fee7f92f2bfa9f7012727740-Paper.pdf

International Journal of Computing

Learnable Extended Activation Function for Deep Neural Networks

Authors

DOI:

Keywords:

Abstract

References

Downloads

Published

How to Cite

Issue

Section

License

Information