Open Access Open Access  Restricted Access Subscription Access

LEARNING FROM THE ENVIRONMENT WITH A UNIVERSAL REINFORCEMENT FUNCTION

Diego Ariel Bendersky, Juan Miguel Santos

Abstract


Traditionally, in Reinforcement Learning, the specification of the task is contained in the reinforcement function (RF), and each new task requires the definition of a new RF. But in the nature, explicit reward signals are limited, and the characteristics of the environment affects not only “how” animals perform particular tasks, but also “what” skills an animal will develop during its life. In this work, we propose a novel use of Reinforcement Learning that consists in the learning of different abilities or skills, based on the characteristics of the environment, using a fixed and universal reinforcement function. We also show a method to build a RF for a skill using information from the optimal policy learned in a particular environment and we prove that this method is correct, i.e., the RF constructed in this way produces the same optimal policy.

Keywords


Reinforcement learning; environment influence; skills; autonomous robots

Full Text:

PDF

References


B. F. Skinner. About Behaviorism, Random House, 1974.

Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, 1998, Bradford Book.

R. A. Brooks. Intelligence without reason. Proceedings of the 12th International Joint Conference on Artificial Intelligence (IJCAI-91), John Myopoulos and Ray Reiter, Eds., Sydney, Australia, 1991, pp. 569–595, Morgan Kaufmann publishers Inc.: San Mateo, CA, USA.

F. S. Keller. Learning: Reinforcement Theory. Random House, New York, 1969.

C. J. Watkins. Learning from delayed rewards, Ph.D. thesis, Cambridge university, 1989.

A. Bonarini, C. Bonacina, M. Matteucci. An approach to the design of reinforcement functions in real world, agent-based applications, IEEE Transactions on Systems, Man, and Cybernetics, Part B, vol. 31, no. 3, pp. 288–301, 2001.

J. M. Santos. Contribution to the study and the design of reinforcement functions. Ph.D. thesis, Universidad de Buenos Aires, Universite d’Aix-Marseille III, 1999.

A. Ng, D. Harada, S. Russell. Policy invariance under reward transformations: theory and application to reward shaping. In Proceedings of the 16th International Conference on Machine Learning. 1999, pp. 278–287, Morgan Kaufmann, San Francisco, CA.

R. Matuk. J. M. Santos. The clustering aliasing problem in reinforcement learning for robots. In Proceedings of the Fifth European Workshop on Reinforcement Learning, Utrecht, The Netherlands, 2001, pp. 33–35.

J. Randlov. Shaping in reinforcement learning by changing the physics of the problem. In Proceedings of the Seventeenth International Conference on Machine Learning, 2000.

A. Matt, G. Regensburger. Reinforcement Learning for Several Environments: Theory and Applications. Ph.D. thesis, University of Innsbruck, 2003.

S. Thrun, T. Mitchell. Lifelong robot learning. Robotics and Autonomous Systems, vol. 15, pp. 25–46, 1995.

R. S. Sutton, D. Precup, S. P. Singh. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, vol. 112, pp. 181–211, 1999.


Refbacks

  • There are currently no refbacks.
hgs yükleme