The latest win by AI bot Pluribus over top poker professionals is a significant milestone in the evolution of AI. A few years ago, we saw Google DeepMind’s AlphaGo bot beat the human world Go playing champion in a best of five games by four to one. Go, like chess, is a game of perfect information. Chess-playing bots use brute force to search variations in play to produce a move and there is little AI in this approach. Go, however, stretched the boundary and used machine learning to achieve its success. It had no choice because there are too many variations for a brute force search approach. Games of imperfect information are a level of difficulty higher than games of perfect information, and how to deal with uncertainty is an important measure of AI. AI researcher Peter Norvig has described intelligence as being able to make decisions in unknown environments. The ideas in Pluribus are novel and go beyond the designs currently in common use. If they are transferred to other real-world applications where information is limited, AI will progress further into our lives.
The key reason Pluribus is significant is that uncertainty is common in real-world applications
As reported in Science, July 17, 2019, Pluribus beat six top poker professionals in 150,000 hands of no-limit Texas hold’em. The two researchers behind Pluribus, PhD student Noam Brown and Professor Tuomas Sandhom, have affiliations with Carnegie Mellon University and a host of companies (one linked with Brown is Facebook), with funding principally from the US National Science Foundation. Sandhom’s work is also funded by the US Army Research Office. These researchers built on the success of a previous version of their bot, Libratus, which in 2017 beat four top players in a game of 120,000 hands. The more players in the multiple player scenario, the greater the challenge for the bot. This work is an improvement over early work such as DeepStack, which was for a two-player game.
Several aspects are significant about Pluribus. The core strategy it used was developed without human-based data. Instead, its core engine developed a blueprint formed by playing against itself using reinforcement learning, similar to Google DeepMind’s AlphaZero bot for playing Go. Two additional engines were then introduced to refine Pluribus’s play against humans using what the researchers call action abstraction and information abstraction, as described in their Science paper. These abstractions help reduce the size of the decision space. Pluribus was also trained on CPUs and did not require additional acceleration from GPUs or the TPUs that AlphaGo/AlphaZero use on Google Cloud.
The professional poker players involved in the trials found Pluribus created novel styles of play they found challenging. Similar findings have resulted in examining the stratagems of top playing bots in backgammon, chess, and Go.
Finally, chess bots, such as Stockfish, the 2017 computer chess champion, use brute force search and not AI, but more recently, AlphaZero was applied to chess, learning the game in just four hours and went head to head with Stockfish in 2018. The result in a 100-game match was that AlphaZero won with 28 wins, 72 draws, and zero losses. This approach of superior playability from a “tabula rasa” start and no learning from humans is indicative of the power of AI and tells us something about how the best humans play a game differently from machines, and could even teach humans how to learn better.
The implications for real-world AI applications
While Pluribus can play poker and nothing else, its approach will lead to further work in applying AI to real-world applications that are rich with uncertainty. For example, one area that is receiving much investment is autonomous driving, and the biggest challenge for researchers in this field is that real-world driving conditions are full of uncertainty, from the environment such as the weather, to how other drivers on the roads behave. AI systems need to be robust to the variability of real-world conditions and unexpected circumstances, and must make the right decisions about when to continue driving, take evasive action, or stop in a safe way. The poker AI demonstrates that it is possible to build a successful AI system that operates where a degree of uncertainty exists.
Michael Azoff, Distinguished Analyst, IT Infrastructure