Title AlphaZero: strojno učenje podrškom bez domenskog znanja
Title (english) AlphaZero: machine learning with support without domain knowledge : AlphaZero
Author Jelena Lončar
Mentor Zvonimir Bujanović (mentor)
Committee member Zvonimir Bujanović (predsjednik povjerenstva)
Committee member Dražen Adamović (član povjerenstva)
Committee member Vedran Krčadinac (član povjerenstva)
Committee member Franka Miriam Brückler (član povjerenstva)
Granter University of Zagreb Faculty of Science (Department of Mathematics) Zagreb
Defense date and country 2020-12-02, Croatia
Scientific / art field, discipline and subdiscipline NATURAL SCIENCES Mathematics
Abstract U ovom je diplomskom radu predstavljen AlphaZero, algoritam tvrtke DeepMind koji tabula rasa može postići nadljudski učinak u raznovrsnim izazovnim domenama, poput šaha, shogija (japanskog šaha) i igre Go. Naime, dotadašnje prvake u navedenim trima igrama uvjerljivo je pobijedio, a njegovu su izuzetnost šahovski velemajstori istaknuli usporedbom njegovog igranja šaha s, primjerice, igranjem superiorne vanzemaljske vrste. Stvaranje algoritma koji tabula rasa stječe nadljudsku vještinu u
... More zahtjevnim domenama bio je dugogodišnji cilj umjetne inteligencije te upravo AlphaZero, sa svojom sposobnošću prilagođavanja raznolikim pravilima igre, predstavlja njegovo ispunjenje i značajan korak naprijed prema ostvarenju općeg sustava za igranje igara. U radu su objašnjeni osnovni koncepti teorije koja leži u pozadini AlphaZero algoritma, podrobno je opisana struktura AlphaZero metode, a njezine su mogućnosti demonstrirane njenom implementacijom za igru Četiri u nizu pomoću programskog jezika Python i njegovih dodatnih biblioteka. U poglavlju 1 navedeni su relevantni pojmovi teorije igara i umjetne inteligencije (s naglaskom na klasu algoritama strojnog učenja podrškom, u koju možemo svrstati i AlphaZero algoritme), diskutirane su zajedničke karakteristike problema koje AlphaZero rješava, predstavljeni su formalni modeli igara koje uspješno savladava te je kroz objašnjenje strojnog učenja podrškom općenito stvorena podloga za razumijevanje temelja AlphaZero metode. Igre koje AlphaZero uspijeva naučiti uspješno igrati (primjerice, šah, shogi i Go) u kontekstu teorije igara modelirane su kao kombinatorne igre, odnosno determinističke ekstenzivne igre s dva igrača, sa sumom nula i potpunim informacijama. Također, opisane su alternirajuće Markovljeve igre, čiji formalizam AlphaZero slijedi te na koji se njegova metoda najizravnije primjenjuje, i njihov poseban slučaj, Markovljevi procesi odlučivanja (koji su formulacija problema koje rješava klasa algoritama strojnog učenja podrškom). U poglavlju 2 detaljno je opisana struktura AlphaZero metode; razložena je na tri komponente: pretraživanje stabla Monte Carlo metodom, igranje igara algoritma samog protiv sebe i nadzirano učenje, od kojih je svaka opširno opisana zasebno te za koje je objašnjeno na koji način tvore funkcionalnu cjelinu AlphaZero algoritma. U poglavlju 3 navedeni su određeni implementacijski detalji programskog ostvarenja AlphaZerometode za igru Četiri u nizu te su predstavljeni rezultati: napredak u sposobnostima igranja tijekom vremena te ostvaren uspjeh po završetku procesa učenja. Prikazan je uspjeh postignut u igrama protiv drugih algoritama za igranje igre Četiri u nizu, poput algoritma minimaks, ali i protiv čovjeka. Konačno, u sklopu ovog diplomskog rada implementirano je te opisano grafičko sučelje koje korisniku omogućava igranje igre Četiri u nizu protiv agenata dobivenih pokretanjem implementacije AlphaZero metode te koje predstavlja svojevrsnu ”materijalizaciju” dobivenih rezultata i ”opipljivi” konačan proizvod ovog diplomskog rada. Less
Abstract (english) In this master thesis we have acquainted the reader with AlphaZero, DeepMind’s algorithm capable of achieving, tabula rasa, superhuman performance in many challenging domains, such as chess, shogi (Japanese chess) and Go. Not only has AlphaZero managed to convincingly defeat the previous world-champions in all the aforementioned games, but its exceptional abilities have been described by chess grandmasters as those to be expected from a superhuman extraterrestrial species. The
... More creation of an algorithm which would be able to achieve, tabula rasa, superhuman skills in various challenging domains has been a longstanding objective of artificial intelligence. AlphaZero, with its ability to adapt to diverse game rules, can be considered to be its realization and a major step towards the attainment of a general game playing system. In this thesis we have explained the basic theoretical concepts underpininng AlphaZero, thoroughly described the structure of the AlphaZero method and demonstrated its possibilities by implementing it for a game called Connect Four (or Four in a Row) using Python as the programming language and its additional libriaries. In chapter 1 we have introduced the pertinent concepts of game theory and artifical intelligence (with emphasis on reinforcement learning — a class of algorithms AlphaZero itself belongs to), studied both which charateristics are common to the problems successfully solved by AlphaZero and how to formally model such problems, as well as facilitated the understanding of the foundations of the AlphaZero method through making sense of reinforcement learning in general. The games AlphaZero has managed to master (such as chess, shogi and Go) have been formulated in the context of game theory as combinatorial games, which are deterministic, zero-sum, perfect information extensive games with two players. Moreover, we have desribed alternating Markov games, whose formalism AlphaZero follows and to which the AlphaZero method can be applied most directly, along with their special case, Markov decision processes (a formulation of the problems solved by reinforcement learning algorithms). In chapter 2 we have comprehensively described the structure of the AlphaZero method; with Monte Carlo tree search, self-play and supervised learning being its three main components, we have expounded on each one of them and elucidated on how they all come together to form a functional whole that constitues the AlphaZero algorithm. In chapter 3 we have discussed certain details of our AlphaZero implementation for the game of Connect Four, as well as presented the results: the progress in game-playing abilities over the course of training and the achieved level of play at the end of training. We have demonstrated our agents’ strengths through plays against other Connect Four algorithms, such as a minimax-based algorithm, as well as in matches against humans. Finally, as part of this master thesis, we have implemented and described a graphical user interface which allows human players to compete against agents obtained during the training process. By creating such a game implementation, we have “materialized” the obtained results and produced a ”tangible” final product of this master thesis. Less
Keywords
algoritam tvrtke DeepMind
šah
shogi
Go
igra Četiri u nizu
Keywords (english)
DeepMind’s algorithm
chess
shogi
Go
the game of Connect Four
Language croatian
URN:NBN urn:nbn:hr:217:006628
Study programme Title: Computer Science and Mathematics Study programme type: university Study level: graduate Academic / professional title: magistar/magistra računarstva i matematike (magistar/magistra računarstva i matematike)
Type of resource Text
File origin Born digital
Access conditions Open access
Terms of use
Created on 2021-02-23 10:09:31