LEARNING FROM THE ENVIRONMENT WITH A UNIVERSAL REINFORCEMENT FUNCTION
DOI:
https://doi.org/10.47839/ijc.5.3.410Keywords:
Reinforcement learning, environment influence, skills, autonomous robotsAbstract
Traditionally, in Reinforcement Learning, the specification of the task is contained in the reinforcement function (RF), and each new task requires the definition of a new RF. But in the nature, explicit reward signals are limited, and the characteristics of the environment affects not only “how” animals perform particular tasks, but also “what” skills an animal will develop during its life. In this work, we propose a novel use of Reinforcement Learning that consists in the learning of different abilities or skills, based on the characteristics of the environment, using a fixed and universal reinforcement function. We also show a method to build a RF for a skill using information from the optimal policy learned in a particular environment and we prove that this method is correct, i.e., the RF constructed in this way produces the same optimal policy.References
B. F. Skinner. About Behaviorism, Random House, 1974.
Richard S. Sutton and Andrew G. Barto. Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, 1998, Bradford Book.
R. A. Brooks. Intelligence without reason. Proceedings of the 12th International Joint Conference on Artificial Intelligence (IJCAI-91), John Myopoulos and Ray Reiter, Eds., Sydney, Australia, 1991, pp. 569–595, Morgan Kaufmann publishers Inc.: San Mateo, CA, USA.
F. S. Keller. Learning: Reinforcement Theory. Random House, New York, 1969.
C. J. Watkins. Learning from delayed rewards, Ph.D. thesis, Cambridge university, 1989.
A. Bonarini, C. Bonacina, M. Matteucci. An approach to the design of reinforcement functions in real world, agent-based applications, IEEE Transactions on Systems, Man, and Cybernetics, Part B, vol. 31, no. 3, pp. 288–301, 2001.
J. M. Santos. Contribution to the study and the design of reinforcement functions. Ph.D. thesis, Universidad de Buenos Aires, Universite d’Aix-Marseille III, 1999.
A. Ng, D. Harada, S. Russell. Policy invariance under reward transformations: theory and application to reward shaping. In Proceedings of the 16th International Conference on Machine Learning. 1999, pp. 278–287, Morgan Kaufmann, San Francisco, CA.
R. Matuk. J. M. Santos. The clustering aliasing problem in reinforcement learning for robots. In Proceedings of the Fifth European Workshop on Reinforcement Learning, Utrecht, The Netherlands, 2001, pp. 33–35.
J. Randlov. Shaping in reinforcement learning by changing the physics of the problem. In Proceedings of the Seventeenth International Conference on Machine Learning, 2000.
A. Matt, G. Regensburger. Reinforcement Learning for Several Environments: Theory and Applications. Ph.D. thesis, University of Innsbruck, 2003.
S. Thrun, T. Mitchell. Lifelong robot learning. Robotics and Autonomous Systems, vol. 15, pp. 25–46, 1995.
R. S. Sutton, D. Precup, S. P. Singh. Between MDPs and semi-MDPs: A framework for temporal abstraction in reinforcement learning. Artificial Intelligence, vol. 112, pp. 181–211, 1999.
Downloads
Published
How to Cite
Issue
Section
License
International Journal of Computing is an open access journal. Authors who publish with this journal agree to the following terms:• Authors retain copyright and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution License that allows others to share the work with an acknowledgement of the work's authorship and initial publication in this journal.
• Authors are able to enter into separate, additional contractual arrangements for the non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgement of its initial publication in this journal.
• Authors are permitted and encouraged to post their work online (e.g., in institutional repositories or on their website) prior to and during the submission process, as it can lead to productive exchanges, as well as earlier and greater citation of published work.