JISE


  [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13]


Journal of Information Science and Engineering, Vol. 37 No. 3, pp. 553-573


Position Control and Production of Various Strategies for Game of Go Using Deep Learning Methods


YUAN SHI, TIANWEN FAN, WANXIANG LI, CHU-HSUAN HSUEH
AND KOKOLO IKEDA
School of Information Science
Japan Advanced Institute of Science and Technology
Nomi, Ishikawa, 9231292 Japan
E-mail: {shiyuan; fantianwen; wanxiang.li; hsuehch; kokolo}@jaist.ac.jp


Computer Go programs have surpassed top-level human players by using deep learning and reinforcement learning techniques. Other than the strength, entertaining Go AI and AI coaches are also interesting directions but have not been well investigated. Some researchers have worked on entertaining beginners or intermediate players. One topic is position control, aiming to make strong programs play close games against weak players. Under such a scenario, the naturalness of the moves is likely to influence weaker players’ enjoyment. Another topic is producing various strategies (or preferences), which human players usually have. Some methods for the two topics have been proposed and evaluated for a traditional Monte-Carlo tree search (MCTS) program. However, there are some critical differences between traditional MCTS programs and recent programs based on AlphaGo Zero, such as LeelaZero and KataGo. For example, recent programs do not run random simulations to the ends of games in MCTS, making the existing method for producing various strategies not applicable. In this paper, we first summarize such differences and some resulted problems. We then adapt existing methods as well as propose new methods to solve the problems, where promising results are obtained. For position control, the modified LeelaZero can play gently against a weaker player (48% of wins against a weaker program, Ray). A human subject experiment shows that the average number of unnatural moves per game is 1.22, while that by a simple method without considering naturalness is 2.29. We also propose a new position control method specifically for endgames. Finally, for producing various strategies, two methods are introduced. In our experiments, center- and edge/corner-oriented strategies are produced by both methods, and human players can successfully identify the strategies.


Keywords: computer Go, position control, various strategies, entertainment, coaching, deep learning, AlphaGo Zero

  Retrieve PDF document (JISE_202103_04.pdf)