Faruk POLAT
(Orta Doğu Teknik Üniversitesi, Mühendislik Fakültesi, Bilgisayar Mühendisliği Bölümü, Ankara, Türkiye)
Proje Grubu: TÜBİTAK EEEAG ProjeSayfa Sayısı: 133Proje No: 215E250Proje Bitiş Tarihi: 01.05.2018Türkçe

0 0
Kısmi Gözlemlenebilir Ardışık Karar Vermede Alt Hedef Tespiti
Kısmi gözlemlenebilirlik durumunda ardısık karar verme, algısal aynılıgın ve büyük boyutlulugun getirdigi sorunlar nedeniyle zor bir problem olarak bilinmektedir. Ögrenme algoritmaları, ardısık karar verme problemine adaptif etmen bakıs açısıyla yaklasmaya çalısır, ve bazı yaklasıklastırma yöntemleri kullanarak söz konusu problemle basa çıkmayı dener. Takviye ögrenme (RL), özerk etmen modeline uyumlulugu, gerçeklestiriminin göreceli olarak kolay olması ve gerçek dünyadaki durumlara adaptasyonunun rahatlıgı gibi bilinen bazı özellikleri nedeniyle, güçlü bir çevrim-içi ögrenme yöntemi olarak kabul görür. Teorik olarak Markov karar süreci (MDP) modelini temel alan RL yöntemlerinin, bazı varsayım ve kısıtlamalar çerçevesinde kısmi gözlemlenebilir MDP (POMDP) versiyonları mevcuttur. Literatürde, MDP problemlerinin küçük alt problemlere bölünerek her bir problemin daha az eforla çözüldügü ve bu çözümlerin sonradan birlestirilip problemin bütünü için büyük çözümün üretildigi yöntemler vardır. Bu yöntemler arasında popüler olan bir yaklasım, problemi dogal olarak parçalara ayıran alt-hedeflerin tespitidir. Bu kapsamda MDP-RL yöntemleri için yöntemler önerilmisse de kısmi gözlemlenebilir problemler için alt-hedef tespiti konusu halen olgunluga ulasmamıstır. Bu projenin amacı, POMDP-RL için alt-hedef tespiti alanında henüz hiçbir çalısma yapılmamıs olan, gizli durumlar içeren problemler için bellek tabanlı RL algoritmaları konusunda yeni yöntemler üretmektir. Bu çalısma, hal-i hazırda MDP-RL için mevcut olan çevrim-içi alt-hedef tespit yöntemlerinin POMDP-RL modeline adaptasyonuna veya yeniden tasarlanmasına odaklanmakta, böylece ögrenme performansının herhangi bir çevrim-dısı müdahaleye gerek kalmaksızın artırılmasını amaçlamaktadır. Öncelikle, gerek MDP-RL, gerekse POMDP-RL yöntemleri için mevcut alt-hedef tespit yaklasımları -ögrenme çıktılarını kullanan yöntemlere agırlık verilerek- analiz edilmistir. Ardından, olgun bir POMDP-RL yöntem ailesi olan bellek tabanlı algoritmalara odaklanılarak yeni bir alt-hedef tespit yöntemi gelistirilmistir. Son olarak, literatürde yaygın kabul gören farklı problemler üzerinde karsılastırmalı kosumlarla, önerilen yöntemlerin etkinliginin dogrulanması saglanmıstır.
  • Albus, J. S. 1991. “Outline for a theory of intelligence”. IEEE Transactions on Systems, Man, and Cybernetics 21 (3): 473-509. https://doi.org/10.1109/21.97471.
  • Alpaydın, Ethem. 2004. Introduction to Machine Learning. The MIT Press.
  • Altman, N. S. 1992. “An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression”. The American Statistician 46 (3): 175-85. https://doi.org/10.1080/00031305.1992.10475879.
  • Atkeson, Christopher G. 1992a. “Memory-based approaches to approximating continuous functions”. SANTA FE INSTITUTE STUDIES IN THE SCIENCES OF COMPLEXITYPROCEEDINGS VOLUME-, 12:521–521. ADDISON-WESLEY PUBLISHING CO. http://www-2.cs.cmu.edu/~awm/15781/readings/paper.ps.
  • ———. 1992b. “Memory-based approaches to approximating continuous functions”. Nonlinear Modeling and Forecasting, 503–521.
  • Aydın, Hüseyin, Erkin Çilden, ve Faruk Polat. 2017. “Using Transitional Bottlenecks to Improve Learning in Nearest Sequence Memory Algorithm”. . 29th International Conference on Tools with Artificial Intelligence, ICTAI 2017. Boston, MA, USA.
  • Bellman, Richard Ernest. 1957. Dynamic Programming. Princeton, NJ: Princeton University Press.
  • Bradtke, Steven J., ve Michael O. Duff. 1994a. “Reinforcement learning methods for continuoustime markov decision problems”. Advances In Neural Information Processing Systems, 7:393-400. Cambridge, MA: MIT Press. http://books.nips.cc/papers/files/nips07/0393.pdf.
  • ———. 1994b. “Reinforcement learning methods for continuous-time markov decision problems”. Advances In Neural Information Processing Systems, 7:393–400. NIPS ’94. MIT Press.
  • Brandes, Ulrik. 2001. “A Faster Algorithm for Betweenness Centrality”. The Journal of Mathematical Sociology 25 (2): 163–177.
  • Cassandra, Anthony R., Leslie Pack Kaelbling, ve Michael L. Littman. 1994. “Acting optimally in partially observable stochastic domains”. Proceedings of the Twelfth National
  • Conference on Artificial Intelligence, 2:1023-28. Menlo Park, CA, USA: AAAI Press. http://portal.acm.org/citation.cfm?id=199480.199520.
  • Chen, Fei, Shifu Chen, Yang Gao, ve Zhenduo Ma. 2007. “Connect-based subgoal discovery for options in hierarchical reinforcement learning”. Proceedings of the Third International Conference on Natural Computation, 4:698–702. Haikou, Hainan, China: IEEE. https://doi.org/10.1109/ICNC.2007.312.
  • Chrisman, Lonnie. 1992. “Reinforcement learning with perceptual aliasing: the perceptual distinctions approach”. Proceedings of the Tenth National Conference on Artificial Intelligence, 183-88. AAAI Press. http://dl.acm.org/citation.cfm?id=1867135.1867164.
  • Crook, Paul Anthony. 2006. “Learning in a State of Confusion: Employing Active Perception and Reinforcement Learning in Partially Observable Worlds”. Ph.D. thesis, UK: University of Edinburgh.
  • Demir, Alper, Erkin Çilden, ve Faruk Polat. 2016a. “A History Tree Heuristic to Generate Better Initiation Sets for Options in Reinforcement Learning (extended abstract)”. ECAI 2016 - 22nd European Conference on Artificial Intelligence, 1644–1645.
  • ———. 2016b. “Local Roots: A Tree-Based Subgoal Discovery Method to Accelerate Reinforcement Learning”. Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2016, Riva del Garda, Italy, September 19-23, 2016, Proceedings, Part II, 361–376. https://doi.org/10.1007/978-3-319-46227-1_23.
  • ———. 2017a. “Generating Effective Initiation Sets for Subgoal Driven Options”. (Dergi makale taslağı, hakem değerlendirmesinde).
  • ———. 2017b. “A concept filtering approach for diverse density to discover subgoals in reinforcement learning”. Proceedings of the 29th IEEE International Conference on Tools with Artificial Intelligence, 1–5. ICTAI ’17. Boston, MA. https://doi.org/10.1109/ICTAI.2017.00012.
  • Dietterich, Thomas G. 2000. “Hierarchical reinforcement learning with the MAXQ value function decomposition”. Journal of Artificial Intelligence Research 13: 227-303.
  • Digney, Bruce L. 1998. “Learning hierarchical control structures for multiple tasks and changing environments”. Proceedings of the fifth international conference on simulation of adaptive behavior on From animals to animats, 321–330. Cambridge, MA, USA: MIT Press. http://dl.acm.org/citation.cfm?id=299955.299998.
  • Dijkstra, Edsger W. 1959. “A note on two problems in connexion with graphs”. Numerische mathematik 1 (1): 269–271.
  • Dung, Le Tien, Takashi Komeda, ve Motoki Takagi. 2007. “Reinforcement learning in nonmarkovian environments using automatic discovery of subgoals”. SICE, 2007 Annual Conference, 2601-5. https://doi.org/10.1109/SICE.2007.4421430.
  • ———. 2009. “Solving POMDPs with Automatic Discovery of Subgoals”. Theory and Novel Applications of Machine Learning, editör Meng Joo Er ve Yi Zhou, 229–238. InTech. Freeman, Linton C. 1977. “A Set of Measures of Centrality Based on Betweenness”. Sociometry 40 (1): 35--41. https://doi.org/10.2307/3033543.
  • Girgin, Sertan, Faruk Polat, ve Reda Alhajj. 2010. “Improving reinforcement learning by using sequence trees”. Machine Learning 81 (3): 283-331.
  • Goel, Sandeep, ve Manfred Huber. 2003. “Subgoal discovery for hierarchical reinforcement learning using learned policies”. In Proceedings of the 16th International FLAIRS Conference, editör Ingrid Russell ve Susan M. Haller, 346-50. St. Augustine, Florida, USA: AAAI Press.
  • Gosavi, Abhijit. 2009. “Reinforcement learning: a tutorial survey and recent advances”. INFORMS Journal on Computing 21 (2): 178–192. https://doi.org/10.1287/ijoc.1080.0305.
  • Hauskrecht, Milos. 2000. “Value-function approximations for partially observable markov decision processes”. Journal of Artificial Intelligence Research 13: 33–94.
  • Hochreiter, Sepp, ve Jürgen Schmidhuber. 1997. “Long short-term memory”. Neural Computation 9: 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735.
  • Hwang, Woochang, Young-rae Cho, Aidong Zhang, ve Murali Ramanathan. 2006. “Bridging centrality: identifying bridging nodes in scale-free networks”. Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining, 20–23. http://alfred.cse.buffalo.edu/tech-reports/2006-05.pdf.
  • Jaakkola, Tommi, Satinder P. Singh, ve Michael I. Jordan. 1995. “Reinforcement learning algorithm for partially observable markov decision problems”. Advances in Neural Information Processing Systems, 7:345-52. Cambridge, MA: MIT Press.
  • Jiang, Bin, ve Christophe Claramunt. 2004. “Topological Analysis of Urban Street Networks”. Environment and Planning B: Planning and Design 31 (1): 151-62. https://doi.org/10.1068/b306.
  • Jong, Nicholas Kenneth, Todd Hester, ve Peter Stone. 2008. “The utility of temporal abstraction in reinforcement learning”. Proceedings of the 7th International Conference on Autonomous Agents and Multiagent Systems, 1:299–306. Estoril, Portugal: International Foundation for Autonomous Agents and Multiagent Systems. http://dl.acm.org/citation.cfm?id=1402383.1402429.
  • Jonsson, Anders, ve Andrew Barto. 2006. “Causal graph based decomposition of factored MDPs”. Journal of Machine Learning Research 7 (Kasım): 2259–2301.
  • Kaelbling, Leslie Pack, Michael L. Littman, ve Anthony R. Cassandra. 1998. “Planning and acting in partially observable stochastic domains”. Artificial Intelligence 101 (1-2): 99-134. https://doi.org/10.1016/S0004-3702(98)00023-X.
  • Kaelbling, Leslie Pack, Michael L. Littman, ve Andrew P. Moore. 1996. “Reinforcement learning: a survey”. Journal of Artificial Intelligence Research 4: 237-85.
  • Kamaya, Hiroyuki, Haeyeon Lee, ve Kenichi Abe. 2000. “Switching Q-learning in partially observable Markovian environments”. Intelligent Robots and Systems, 2000.(IROS 2000). Proceedings. 2000 IEEE/RSJ International Conference on, 2:1062–1067. IEEE. http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=893160.
  • Kazemitabar, Seyed Jalal, ve Hamid Beigy. 2008. “Automatic discovery of subgoals in reinforcement learning using strongly connected components”. Proceedings of the 15th International Conference on Neuro- Information Processing, Part I:829–834. Auckland, New Zealand: Springer-Verlag. http://dl.acm.org/citation.cfm?id=1813488.1813597.
  • Kheradmandian, Ghorban, ve Mohammad Rahmati. 2009. “Automatic abstraction in reinforcement learning using data mining techniques”. Robotics and Autonomous Systems 57 (11): 1119–1128. http://dx.doi.org/10.1016/j.robot.2009.07.002.
  • Konidaris, George, ve Andre S. Barreto. 2009. “Skill discovery in continuous reinforcement learning domains using skill chaining”. Advances in Neural Information Processing Systems, 1015–1023.
  • Lin, Long-Ji. 1991. “Programming robots using reinforcement learning and teaching”. Proceedings of the ninth National conference on Artificial intelligence - Volume 2, 781-86. Anaheim, California: AAAI Press. http://dl.acm.org/citation.cfm?id=1865756.1865798.
  • ———. 1992a. “Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching”. Machine Learning 8 (3-4): 293–321. http://dx.doi.org/10.1007/BF00992699.
  • ———. 1992b. “Self-improving reactive agents based on reinforcement learning, planning and teaching”. Machine Learning 8 (3): 293–321.
  • ———. 1993. “Reinforcement Learning for Robots Using Neural Networks”. Pittsburgh, PA, USA: Carnegie Mellon University, School of Computer Science.
  • Lin, Long-Ji, ve Tom M. Mitchell. 1992. “Memory approaches to reinforcement learning in nonmarkovian domains”. Technical Report. Pittsburgh, PA, USA: Carnegie Mellon University.
  • Littman, Michael Lederman. 1994. “Memoryless policies: theoretical limitations and practical results”. From Animals to Animats 3: Proceedings of the third International Conference on Simulation of Adaptive Behavior, 238-45. Cambridge, MA: MIT Press. http://portal.acm.org/citation.cfm?id=189829.189893.
  • Loch, John, ve Satinder P. Singh. 1998. “Using eligibility traces to find the best memoryless policy in partially observable markov decision processes”. Proceedings of the Fifteenth International Conference on Machine Learning, 323-31. San Francisco, CA: Morgan Kaufmann Publishers Inc. http://portal.acm.org/citation.cfm?id=645527.657452.
  • Mannor, Shie, Ishai Menache, Amit Hoze, ve Uri Klein. 2004. “Dynamic abstraction in reinforcement learning via clustering”. Proceedings of the Twenty-first International Conference on Machine Learning, 71-78. ACM. http://doi.acm.org/10.1145/1015330.1015355.
  • Maron, Oded, ve Tomás Lozano-Pérez. 1998. “A framework for multiple-instance learning”. , 570–576. Cambridge, MA, USA: MIT Press. http://dl.acm.org/citation.cfm?id=302528.302753.
  • McCallum, Andrew. 1996a. “Reinforcement Learning with Selective Perception and Hidden State”. Ph.D. thesis, NY: University of Rochester.
  • ———. 1996b. “Reinforcement Learning with Selective Perception and Hidden State”. Ph.D. thesis, NY: University of Rochester.
  • McCallum, R. Andrew. 1993. “Overcoming Incomplete Perception with Utile Distinction Memory”. In Proceedings of the Tenth International Conference on Machine Learning, 190–196. Morgan Kaufmann.
  • McGovern, Amy. 1998. “acQuire-macros: An algorithm for automatically learning macro-actions”. The Neural Information Processing Systems Conference Workshop on Abstraction and Hierarchy in Reinforcement Learning.
  • McGovern, Amy, ve Andrew G. Barto. 2001. “Automatic discovery of subgoals in reinforcement learning using diverse density”. Proceedings of the Eighteenth International Conference on Machine Learning, 361-68. MA, USA: Morgan Kaufmann Publishers Inc. http://dl.acm.org/citation.cfm?id=645530.655681.
  • Mcgovern, Amy, Richard S. Sutton, ve Andrew H. Fagg. 1997. “Roles of Macro-Actions in Accelerating Reinforcement Learning”. In Grace Hopper Celebration of Women in Computing, 13–18.
  • McGovern, Elizabeth Amy. 2002. “Autonomous Discovery of Temporal Abstractions from Interaction with an Environment”. Ph.D. thesis, Amherst, MA, USA: University of Massachusetts Amherst.
  • Menache, Ishai, Shie Mannor, ve Nahum Shimkin. 2002. “Q-Cut---Dynamic Discovery of Subgoals in Reinforcement Learning”. 13th European Conference on Machine Learning Proceedings, 295-306. Helsinki, Finland,: Springer-Verlag. https://doi.org/10.1007/3-540- 36755-1_2.
  • Moore, Andrew W. 1995. “Memory-Based Learning for Control”. Artificial Intelligence Review. Moore, Andrew W., Christopher G. Atkeson, ve Stefan A. Schaal. 1995. “Memory-Based Learning for Control.” DTIC Document.
  • http://oai.dtic.mil/oai/oai?verb=getRecord&metadataPrefix=html&identifier=ADA311505. Moore, Andrew William. 1990. “Efficient memory-based learning for robot control”. UCAM-CLTR- 209. University of Cambridge, Computer Laboratory. http://www.cl.cam.ac.uk/techreports/UCAM-CL-TR-209.pdf.
  • Parr, Ronald, ve Stuart Russell. 1998. “Reinforcement learning with hierarchies of machines”. Advances In Neural Information Processing Systems, 10:1043-49. Cambridge, MA, USA: MIT Press. http://papers.nips.cc/paper/1384-reinforcement-learning-with-hierarchies-ofmachines.
  • Pendrith, Mark D., ve Michael McGarity. 1998. “An analysis of direct reinforcement learning in non-Markovian domains”. Proceedings of the Fifteenth International Conference on Machine Learning, 421-29. San Francisco, CA: Morgan Kaufmann Publishers Inc. http://dl.acm.org/citation.cfm?id=645527.657309.
  • Peshkin, Leonid, Nicolas Meuleau, ve Leslie Pack Kaelbling. 1999. “Learning policies with external memory”. Proceedings of the Sixteenth International Conference on Machine Learning, 307-14. San Francisco, CA, USA: Morgan Kaufmann Publishers Inc. http://portal.acm.org/citation.cfm?id=645528.657642.
  • Rummery, G. A., ve M. Niranjan. 1994. “On-line Q-learning using connectionist systems”. CUED/F-INFENG/TR 166. Cambridge, UK: Cambridge University Engineering Department.
  • Russell, Stuart J., ve Peter Norvig. 2003. Artificial Intelligence: A Modern Approach. Second. Pearson Education.
  • Schneider, Je G. 1995. “Robot skill learning through intelligent experimentation”. Citeseer. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.24.5702&rep=rep1&type=pdf.
  • Shi, Jianbo, ve Jitendra Malik. 2000. “Normalized cuts and image segmentation”. Pattern Analysis and Machine Intelligence, IEEE Transactions on 22 (8): 888–905.
  • Sigaud, Olivier, ve Olivier Buffet. 2010. Markov Decision Processes in Artificial Intelligence. ISTE - Wiley.
  • Simsek, Ozgur. 2008. “Behavioral Building Blocks for Autonomous Agents: Description, Identification, and Learning”. Ph.D. thesis, University of Massachusetts Amherst.
  • Simsek, Ozgur, ve Andrew G. Barto. 2004. “Using relative novelty to identify useful temporal abstractions in reinforcement learning”. Proceedings of the Twenty-first International Conference on Machine Learning, 95–102. ICML ’04. ACM.
  • Sondik, Edward Jay. 1971. “The optimal control of partially observable Markov decision processes”. Ph.D. thesis, Stanford, CA, USA: Stanford University.
  • Stolle, Martin, ve Doina Precup. 2002. “Learning options in reinforcement learning”. Proceedings of the 5th International Symposium on Abstraction, Reformulation, and Approximation, editör Sven Koenig ve Robert C. Holte, 2371:212-23. London, UK: Springer Berlin / Heidelberg.
  • Stone, Peter, Richard S. Sutton, ve Gregory Kuhlmann. 2005. “Reinforcement learning for RoboCup soccer keepaway”. Adaptive Behavior 13 (3): 165–188. https://doi.org/10.1177/105971230501300301.
  • Sutton, Richard S. 1988. “Learning to predict by the methods of temporal differences”. Mach. Learn. 3 (1): 9–44. https://doi.org/10.1023/A:1022633531479.
  • Sutton, Richard S., ve Andrew G. Barto. 1998. Reinforcement Learning: An Introduction. Adaptive Computation and Machine Learning. MIT Press. http://books.google.com/books?id=CAFR6IBF4xYC.
  • Sutton, Richard S., Doina Precup, ve Satinder Singh. 1999. “Between MDPs and semi-MDPs: a framework for temporal abstraction in reinforcement learning”. Artificial Intelligence 112 (1-2): 181-211. http://dx.doi.org/10.1016/S0004-3702(99)00052-1.
  • Şimşek, Özgür. 2008. “Behavioral Building Blocks for Autonomous Agents: Description, Identification, and Learning”. Ph.D. thesis, University of Massachusetts Amherst.
  • Şimşek, Özgür, ve Andrew G. Barto. 2004. “Using relative novelty to identify useful temporal abstractions in reinforcement learning”. Proceedings of the Twenty-first International Conference on Machine Learning, 95-102. ACM. http://doi.acm.org/10.1145/1015330.1015353.
  • Şimşek, Özgür, Alicia P. Wolfe, ve Andrew G. Barto. 2005. “Identifying useful subgoals in reinforcement learning by local graph partitioning”. Proceedings of the 22nd international conference on Machine Learning, 816–823. Bonn, Germany: ACM. http://doi.acm.org/10.1145/1102351.1102454.
  • Taghizadeh, Nasrin, ve Hamid Beigy. 2013. “A Novel Graphical Approach to Automatic Abstraction in Reinforcement Learning”. Robotics and Autonomous Systems 61 (8): 821- 35. https://doi.org/10.1016/j.robot.2013.04.010.
  • Theocharous, Georgios, ve Leslie Pack Kaelbling. 2004. “Approximate planning in POMDPs with macro-actions”. Advances In Neural Information Processing Systems, 16:775-82. MIT Press.
  • Watkins, Chris. 1989. “Learning from Delayed Rewards”. Ph.D. thesis, UK: Cambridge University.
  • Watts, Duncan J, ve Steven H Strogatz. 1998. “Collective dynamics of ‘small-world’networks”. nature 393 (6684): 440–442.
  • Wei, Y.-C., ve C.-K. Cheng. 1991. “Ratio cut partitioning for hierarchical designs”. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 10 (7): 911– 921.
  • Wiering, Marco, ve Juergen Schmidhuber. 1996. “HQ-Learning: discovering markovian subgoals for non-markovian reinforcement learning”. Technical Report IDSIA-95-96. Istituto Dalle Molle Di Studi Sull Intelligenza Artificiale.
  • Yang, Bo, ve Jiming Liu. 2008. “Discovering Global Network Communities Based on Local Centralities”. ACM Transactions on the Web 2 (1): 1-32. https://doi.org/10.1145/1326561.1326570.
  • Yoshikawa, Takeshi, ve Masahito Kurihara. 2006. “An acquiring method of macro-actions in reinforcement learning”. IEEE International Conference on Systems, Man, and Cybernetics, 6:4813-17. https://doi.org/10.1109/ICSMC.2006.385067.

TÜBİTAK ULAKBİM Ulusal Akademik Ağ ve Bilgi Merkezi Cahit Arf Bilgi Merkezi © 2019 Tüm Hakları Saklıdır.