Open Access Open Access  Restricted Access Subscription Access


Dorothea Schwung, Andreas Schwung, Steven X. Ding


This paper presents a centralized approach for energy optimization in large scale industrial production systems based on an actor-critic reinforcement learning (ACRL) framework. The objective of the on-line capable self-learning algorithm is the optimization of the energy consumption of a production process while meeting certain manufacturing constraints like a demanded throughput. Our centralized ACRL algorithm works with two artificial neural networks (ANN) for function approximation using Gaussian radial-basis functions (RBF), one for the critic and another for the actor, respectively. This kind of actorcritic design enables the handling of both, a discrete and continuous state and action space, which is essential for hybrid systems where discrete and continuous actuator behavior is combined. The ACRL algorithm is exemplary validated on a dynamic simulation model of a bulk good system for the task of supplying bulk good to a subsequent dosing section while consuming as low energy as possible. The simulation results clearly show the applicability and capability of our machine learning (ML) approach for energy optimization in hybrid production environments.


Machine learning; self-learning; actor-critic reinforcement learning; radial-basis function neural networks; manufacturing systems; hybrid systems; energy optimization.

Full Text:



D. Schwung, A. Schwung and S. X. Ding, “On-line energy optimization of hybrid production systems using actor-critic reinforcement learning,” Proceedings of the 9th IEEE International Conference on Intelligent Systems, 2018, pp. 147-154.

A. Cannata, S. Karnouskos and M. Taisch, “Energy efficiency driven process analysis and optimization in discrete manufacturing,” Proceedings of the 35th Annual Conference of IEEE Industrial Electronics, 2009, pp. 4449-4454.

E. Oh and S.-Y. Son, “Toward dynamic energy management for green manufacturing systems,” IEEE Communications Magazine, vol. 54, issue 10, pp. 74-79, 2016.

D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre, G. Van den Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam, M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner, I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T. Graepel and D. Hassabis, “Mastering the game of Go with deep neural networks and tree search,” Nature, vol. 529, no. 7587, pp. 484–489, 2016.

D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, Ma. Lai, A. Bolton, Y. Chen, T. Lillicrap, F. Hui, L. Sifre, G. van den Driessche, T. Graepel and D. Hassabis “Mastering the game of Go without human knowledge,” Nature, vol. 550, no. 7676, pp. 354–359, 2017.

B. Kiumarsi, K. G. Vamvoudakis, H. Modares and F. L. Lewis, “Optimal and autonomous control using reinforcement learning: A survey,” IEEE Transactions on Neural Networks and Learning Systems, vol. 29, no. 6, pp. 2042-2062, 2018

V. R. Konda and J. N. Tsitsiklis, Actor-critic Algorithms, Advances in Neural Information Processing Systems (NIPS), MIT Press, 2000.

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, Cambridge, MA, USA: MIT Press, 1998.

B. Peng, X. Li, J. Gao, J. Liu, Y.-N. Chen, K.-F. Wong, “Adversarial advantage actor-critic model for task-completion dialogue policy learning,” Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2018, pp. 6149-6153.

G. Weisz, P. Budzianowski, P.-H. Su and M. Gasic, “Sample efficient deep reinforcement learning for dialogue systems with large action spaces,” IEEE/ACM Transactions on Audi, Speech and Language Processing, vol. 26, no. 11, pp. 2083-2097, 2018.

K. Zhang, Z. Yang and T. Basar, “Networked multi-agent reinforcement learning in continuous spaces,” Proceedings of the IEEE Conference on Decision and Control (CDC), 2018, pp. 2771-2776.

J. Skach, B. Kiumarsi, F. L. Lewis and O. Straka, “Actor-critic off-policy learning for optimal control of multiple-model discrete-time systems,” IEEE Transactions on Cybernatics, vol. 48, no. 1, pp. 29-40, 2018.

C. G. Li, M. Wang, Z. J. Huang and Z. F. Zhang, “An Actor-critic reinforcement learning algorithm based on adaptive RBF network,” Proceedings of the 8th International Conference on Machine Learning an Cybernetics, 2009, pp. 984-988.

H. van Hasselt and M. A. Wiering, “Reinforcement learning in continuous action spaces,” Proceedings of the IEEE Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL), 2007, pp. 272-279.

Y. Wu, H. Wang, B. Zhang and K.-L. Du, “Using radial basis function networks for function approximation and classification,” International Scholarly Research Network (ISRN) Applied Mathematics, vol. 2012, pp. 1-34, 2011.

K. Voutsas and J. Adamy, “A biologically inspired spiking neural network for sound source lateralization,” IEEE Transactions on Neural Networks, vol. 18, no. 6, pp. 1785-1799, 2007.

F. Ponulak and A. Kasinski, “Introduction to spiking neural networks: Information processing, learning and applications,” Acta Neurobiologiae Experimentalis, vol. 71, pp. 409-433, 2011.

J.L. Lobo, J. Del Ser, A. Bifet and N. Kasabov, “Spiking neural networks and online learning: An overview and perspectives,” Neural Networks, vol. 121, no. 1, pp. 88-100, 2020.

P. Zeng, H. Li, H. He and S. Li, “Dynamic energy management of microgrid using approximate dynamic programming and deep recurrent neural network learning,” IEEE Transactions on Smart Grid, vol. 10, no. 4, pp. 4435-4445, 2019.

R. Leo, R. S. Milton and S. Sibi, “Reinforcement learning for optimal energy management of a solar microgrid,” Proceedings of the IEEE Global Humanitarian Technology Conference – South Asia Satellite (GHTC-SAS), 2014, pp. 183-188.

E. Mocanu, D. C. Mocanu, P. H. Nguyen, A. Liotta, M. E. Webber, M. Gibescu and J. G. Slootweg, “On-line building energy optimization using deep reinforcement learning,” IEEE Transactions on Smart Grid, vol. 10, no. 4, pp. 3698-3708, 2019.

T. Liu, Y. Zou, D. Liu and F. Sun, “Reinforcement learning of adaptive energy management with transition probability for a hybrid electric tracked vehicle,” IEEE Transactions on Industrial Electronics, vol. 62, no. 12, pp. 7837-7846, 2015.

X. Qi, Y. Luo, G. Wu, K. Boriboonsomsin and M. J. Barth, “Deep reinforcement learning-based vehicle energy efficiency autonomous learning system,” Proceedings of the IEEE Intelligent Vehicles Symposium (IV), 2017, pp. 1228-1233.

F. Ruelens, B. J. Claessens, S. Quaiyum, B. De Schutter, R. Babuska and R. Belmans, “Reinforcement learning applied to an electric water heater: From theory to practice,” IEEE Transactions on Smart Grid, vol. 9, no. 4, pp. 3792-3800, 2018.

C. Liu and Y. L. Murphey, “Optimal power management based on Q-learning and neuro-dynamic programming for plugin hybrid electric vehicles,” IEEE Transactions on Neural Networks and Learning Systems, Early Access, pp. 1-13, 2019.

B. Fernandez-Gauna, U. Fernandez-Gamiz and M. Graña, “Variable speed wind turbine controller adaptation by reinforcement learning,” Integrated Computer-Aided Engineering, vol. 24, no. 1, pp. 27-39, 2017.

R. Li, Z. Zhao, X. Chen, J. Palicot and H. Zhang, “TACT: A transfer actor-critic learning framework for energy saving in cellular radio access networks,” IEEE Transactions on Wireless Communications, vol. 13, no. 4, pp. 2000-2011, 2014.

M. Patacchiola, Research on Reinforcement Learning, April 2018. [Online]. Available at:

D. Schwung, T. Kempe, A. Schwung and S. X. Ding, “Selfoptimization of energy consumption in complex bulk good processes using reinforcement learning,” Proceedings of the 15th IEEE International Conference on Industrial Informatics, 2017, pp. 231-236.

S. Bhatnagar and R. Sutton and M. Ghavamzadeh and M. Lee, “Natural actor-critic algorithms,” Automatica, vol. 45, issue 11, pp. 2471-2482, 2009.

P.-H. Su, P. Budzianowski, S. Ultes, M. Gasic and S. Young “Sample-efficient actor-critic reinforcement learning with supervised data for dialogue management,” arXiv:1707.00130v2, 2017.

V. Mnih, A. P. Badia, M. Mirza, A. Graves, T. Harley, T. P. Lillicrap, D. Silver and K. Kavukcuoglu, “Asynchronous methods for deep reinforcement learning,” Proceedings of the 33rd International Conference on Machine Learning, vol. 48, 2016, pp. 1928-1937.

G. A. Rummery and M. Niranjan, On-line Q-learning using Connectionist Systems, Technical Report CUED/F-INFENG/TR 166, Engineering Department, Cambridge University, 1994.

R. Lowe, Y. Wu, A. Tamar, J. Harb, P. Abbeel, and I. Mordatch, “Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments,” arXiv:1706.02275v3, 2018.


  • There are currently no refbacks.
hgs yükleme