Game Playing Artificial Intelligence (AI)

November 22, 2025

Explore the evolution of Artificial Intelligence (AI) in game playing, focusing on self-play and learning techniques. This guide discusses key developments from classical games to modern AI systems, highlighting significant breakthroughs like AlphaZero and TD-Gammon, and examines the historical context, challenges, and future implications of AI in gaming.

Table of Contents:

Game Playing Artificial Intelligence (AI): A Guide to Implementing

Abstract: This post examines the evolution of Artificial Intelligence (AI) systems within the context of game playing. Specifically, it focuses on the role of learning and self-play, exploring its application in diverse games, from deterministic ones like Chess, Go, and Checkers, to those with hidden information or randomness, such as Poker, Bridge, and Backgammon.

This analysis is driven by key questions, including: “How has deep learning been successfully integrated into self-play?” and “To what extent have the issues surrounding learning and self-play evolved throughout the history of AI?” To address these, the paper discusses relevant self-play research and experiments. It then assesses the significance of machine learning and its techniques for developing high-quality game-playing AI programs, concluding with an examination of how these advancements have redefined the history of artificial intelligence (AI).

Introduction

The increasing realism of virtual environments necessitates satisfying computational AI, a requirement recognized by modern game players. Nevertheless, the AI in almost all current games still relies on a fixed set of actions, which knowledgeable players can often anticipate. A better approach is to classify player behaviour using machine learning techniques. When developing game AI, there are two primary methods for the system to learn or process the game:

Self-play: The system repeatedly plays against itself.
Learning from opponent moves: The player and the AI have restricted information about the game state, making the deduction of hidden opponent knowledge a core part of the game.

The current paper aims to explore questions across three core areas:

Learning Issues: This includes the importance of machine learning in developing game-playing artificial intelligence (AI) programs.
Self-play Issues: This covers the applicability of self-play to various games, whether AI gains expertise through self-play, and the successful application of deep learning in self-play.
Historical Perspective: This section synthesizes the discussion, providing a brief history of AI and gaming to contextualize the topic.

History of Game-Playing AI

The relationship between games and AI has a long history. Much of the early research focused on creating gameplay agents, with or without a learning component. This was initially the primary, and for a long time, the only application of AI in games. Since the inception of artificial intelligence (AI) as a field, pioneers in computer science wrote game-playing programs to test whether “computers could solve tasks where intelligence is needed”. The first successful game-mastering software was developed by A.S. Douglas.

In 1952, as part of his doctoral dissertation at Cambridge, Douglas programmed the digital version of Tic-Tac-Toe. Using the unique EDSAC computer, this was the first graphical computer game played against a human, communicating player moves (nought or cross) via a mechanical telephone dialler. Years later, Arthur Samuel invented a form of machine learning, now known as reinforcement learning, by creating a program that learned to play checkers through self-play. Reinforcement learning is defined as “including algorithms from the temporal difference learning problem”.

According to Togelius (2018), early AI research primarily focused on classic board games like chess, checkers, and Go, as they are easily modeled in code and developers can emulate them quickly. He also notes that modern computers, leveraging AI technologies, can perform millions of calculations per second.

Learning and Self-play in AI

This section focuses on machine learning and its techniques within AI, specifically addressing self-play. It explores examples, outcomes, and how new techniques have enhanced self-play in AI, answering the questions related to learning and self-play.

Machine Learning

“Machine learning usually refers to the changes in systems that perform tasks associated with artificial intelligence (AI). Such tasks involve recognition, diagnosis, planning, robot control, prediction, etc”.

Machine learning techniques are broadly categorized into two main types:

Supervised Learning: Trains on existing or known input and output data (trained datasets).
Unsupervised Learning: Deals with unknown datasets, identifying hidden patterns in the input data to determine the output.

Beyond these, newer techniques have emerged, including self-supervised learning, reinforcement learning, artificial neural networks, support vector machines, and Decision tree learning. Self-supervised learning is a form of supervised learning where the training data is automatically labelled without human intervention.

Self-play

Self-play is the method where an artificial game system acquires playing skill by competing against clones of itself, rather than being guided by a human expert (assignment specification). Studies suggest that self-play is still not fully understood, as system performance is not always guaranteed. To illustrate this, the section includes several well-known examples of self-play and their results.

The AlphaZero algorithm achieved superhuman performance in Go, Chess, and Shogi by playing against itself through reinforcement learning. Its performance, measured on the Elo scale as a function of training steps, was remarkable: it defeated Stockfish in Chess after only 4 hours (300,000 steps), Elmo in Shogi after 2 hours (110,000 steps), and AlphaGo Lee in Go after 30 hours (74,000 steps). The consistent performance across independent runs suggests the high repeatability of AlphaZero’s training algorithm.

In 1959, Arthur L. Samuel wrote a checkers program that learned from samples gathered from both self-play and human play, using data like “board configuration, game results”. The system employed methods to evaluate the board for its next move, including a lookup table, a pruned (alpha-beta) search tree, limited depth, and an evaluation task with manually engineered features (ibid.). Samuel concluded that, despite earlier limitations in machine learning techniques (like limited progress and optimization of playing strategies), advancements now allow for their efficient use and application to many problems.

Gerald Tesauro created TD-Gammon, a neural network (NN) that plays backgammon using temporal difference learning. These networks were trained from the start to the end of the game and tested in actual gameplay against Sun Microsystems. A key distinction of backgammon from Chess, Checkers, and Go is the element of randomness introduced by dice.

1. key factor

According to Yossi (2019), a key factor in the success of deep learning is that it requires no “handholding.” He states, “Unlike machine learning, we’re not trying to understand what’s inside, (….) We’re feeding it only raw images; we’re not writing any code.” Deep learning algorithms are concise, learning through trial and error at machine speed to achieve higher accuracy. As an example of an AI reaching supernatural accuracy, he points to AlphaGo.

The ancient Chinese game of Go, with a vast configuration space (10^170 possibilities on a 64-square board), made it an ideal challenge for AI. AlphaGo was initially exposed to 160,000 amateur games for foundational knowledge. Its developers then ran the computer to play against itself repeatedly.

By 2015, the machine defeated the European Go Champion 5–0, though it lost to the World Champion 4–1. Yossi later remarked, “After training against myself for three days, it became the best in the world.” He now contends, “There is no game that a machine cannot play better than a human being”.

2. key factor

Pollack and Blair (1996) suggest that the success of TD-Gammon did not depend entirely on the reinforcement and temporal difference methods used by Tesauro, given the evidence of success in backgammon learning with simple hill-climbing. Instead, the success stemmed from the co-evolutionary self-play setup, biased by backgammon’s dynamics. They view TD-Gammon as a major milestone for a type of biological process machine learning, where the initial model specification is simpler because the training environment emerges from the co-development between the learning system and its environment.

The concept of machine learning based on evolution is often linked to Holland’s pioneering genetic algorithm field. However, Holland’s work focuses on optimization using an absolute fitness function (a value used to optimize neural network parameters with CMA-ES). The idea of co-development highlights the difference between optimization based on absolute fitness and one based on relative fitness.

These examples clearly demonstrate that advancements in machine learning. Such as reinforcement learning, temporal difference learning, and deep learning, have profoundly influenced self-play in AI. Existing techniques have also seen improvements, such as self-supervised learning. Which Abshire (2018) notes is used in self-driving vehicles to analyze driver interaction via video footage. Thus, self-play has become increasingly reliant on these newer machine learning techniques.

3. key factor

The mechanisms and analysis vary for different games and AI systems. As they are dependent on the game flow, player behaviour, the specific technique used, and the adaptability of the game features. Since key features differ for each game, a unified mechanism and analysis is not always feasible. For instance, backgammon cannot use alpha-beta pruning and search trees like checkers, and the techniques for backgammon cannot be applied to develop AlphaGo.

Despite the numerous breakthroughs and advantages of machine learning in gaming, Stephenson (2018) points out that significant challenges remain. A major obstacle is the lack of sufficient data for learning. As these algorithms model complex systems and actions for which good historical data is scarce.

Furthermore, game developers must ensure that the machine learning algorithms do not “break the experience” for the player. They must be accurate, fast, and non-disruptive. Any interruption or slowdown can pull the player out of the game’s immersive experience. Nonetheless, most major game development studios have dedicated teams researching, refining, and applying AI to their games.

Historical Changes in AI

The examples and breakthroughs discussed highlight a massive transformation in game AI from past to present. Earlier AI employed strategies and techniques like branching factors and tree search. While modern techniques in gaming and self-play are more advanced. Current systems can learn from a basic level without initial game information.

Performance has also increased with the invention of function approximators, such as fitness functions. The discovery of advanced machine learning techniques has elevated AI in gaming to a point where self-learning is more effective and creative than learning from humans, even leading to machines defeating human players in competitions.

Conclusion

Advancements in machine learning techniques have had a significant impact on AI gameplay. Gaming AI began with foundational machine learning techniques, and new ones continue to emerge. Due to limitations in earlier machine learning (such as restricted progress and optimal playing strategies). Self-play has increasingly depended on advanced techniques like neural networks, reinforcement learning, and deep learning. Breakthroughs based on machine learning include the invention of the first digital Tic-Tac-Toe game by Douglas in 1952 and the concept of the absolute fitness function.

AI is often portrayed as a tool to augment or complement human abilities, but rarely as a peer. Game experiments involving machine-human collaboration offer a glimpse into the future. In a DOTA2 “capture the flag” case study, human players found bots to be more collaborative than other humans, though overall reactions to AI teammates in DOTA 2 were mixed. Some players were enthusiastic, feeling supported and learning from the AI, as noted by a professional DOTA 2 player.

Lavanchy (2019) raises a crucial question: should AI learn from humans or continue to teach itself? Self-learning can lead to greater efficiency and creativity without human mimicry. But it may also make algorithms better suited for tasks without human collaboration, such as warehousing robots. Conversely, training a machine with human data could be argued as more intuitive. Allowing human users to understand the machine’s actions. Ultimately, as AI intelligence grows, so does human astonishment.