LEARNING FROM THE ENVIRONMENT WITH A UNIVERSAL REINFORCEMENT FUNCTION

Authors

  • Diego Ariel Bendersky
  • Juan Miguel Santos

DOI:

https://doi.org/10.47839/ijc.5.3.410

Keywords:

Reinforcement learning, environment influence, skills, autonomous robots

Abstract

Traditionally, in Reinforcement Learning, the specification of the task is contained in the reinforcement function (RF), and each new task requires the definition of a new RF. But in the nature, explicit reward signals are limited, and the characteristics of the environment affects not only “how” animals perform particular tasks, but also “what” skills an animal will develop during its life. In this work, we propose a novel use of Reinforcement Learning that consists in the learning of different abilities or skills, based on the characteristics of the environment, using a fixed and universal reinforcement function. We also show a method to build a RF for a skill using information from the optimal policy learned in a particular environment and we prove that this method is correct, i.e., the RF constructed in this way produces the same optimal policy.

References

B. F. Skinner. About Behaviorism, Random House, 1974.

Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, 1998, Bradford Book.

R. A. Brooks. Intelligence without reason. Proceedings of the 12th International Joint Conference on Artificial Intelligence (IJCAI-91), John Myopoulos and Ray Reiter, Eds., Sydney, Australia, 1991, pp. 569–595, Morgan Kaufmann publishers Inc.: San Mateo, CA, USA.

F. S. Keller. Learning: Reinforcement Theory. Random House, New York, 1969.

C. J. Watkins. Learning from delayed rewards, Ph.D. thesis, Cambridge university, 1989.

A. Bonarini, C. Bonacina, M. Matteucci. An approach to the design of reinforcement functions in real world, agent-based applications, IEEE Transactions on Systems, Man, and Cybernetics, Part B, vol. 31, no. 3, pp. 288–301, 2001.

J. M. Santos. Contribution to the study and the design of reinforcement functions. Ph.D. thesis, Universidad de Buenos Aires, Universite d’Aix-Marseille III, 1999.

A. Ng, D. Harada, S. Russell. Policy invariance under reward transformations: theory and application to reward shaping. In Proceedings of the 16th International Conference on Machine Learning. 1999, pp. 278–287, Morgan Kaufmann, San Francisco, CA.

R. Matuk. J. M. Santos. The clustering aliasing problem in reinforcement learning for robots. In Proceedings of the Fifth European Workshop on Reinforcement Learning, Utrecht, The Netherlands, 2001, pp. 33–35.

J. Randlov. Shaping in reinforcement learning by changing the physics of the problem. In Proceedings of the Seventeenth International Conference on Machine Learning, 2000.

A. Matt, G. Regensburger. Reinforcement Learning for Several Environments: Theory and Applications. Ph.D. thesis, University of Innsbruck, 2003.

S. Thrun, T. Mitchell. Lifelong robot learning. Robotics and Autonomous Systems, vol. 15, pp. 25–46, 1995.

R. S. Sutton, D. Precup, S. P. Singh. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, vol. 112, pp. 181–211, 1999.

Downloads

Published

2014-08-01

How to Cite

Bendersky, D. A., & Santos, J. M. (2014). LEARNING FROM THE ENVIRONMENT WITH A UNIVERSAL REINFORCEMENT FUNCTION. International Journal of Computing, 5(3), 68-74. https://doi.org/10.47839/ijc.5.3.410

Issue

Section

Articles