AlphaGo, Reinforcement Learning, and the Future of Artificial Intelligence

Last year, Google Deepmind took a giant step forward in proving the value of deep learning when the latest version of their Go-playing computer program, AlphaGo Zero, beat the previous model after only three days of self-training.

This is an impressive feat by itself. The implications for business and enterprise analytics, however, are more exciting.

Understanding Reinforcement Learning

Reinforcement learning is a type of machine learning where the machine is allowed to automatically determine the “ideal behavior” within a specific situation in order to obtain the optimal outcome.

There are two parts to this. First, the algorithm weighs the value of as many possible future states as it has time and power to consider.

Next, it selects the best next action based on its current state.

Systems undergoing reinforcement learning aren’t given training sets.

They have no human advice about whether each move is good or bad, just whether the end state is ideal.

Reinforcement learning useful for creating generic decision processes which can theoretically be applied to different domains.

This is actually the goal of Google Deepmind: they’re trying to create generic deep learning algorithms that can be set to analyze any type of situation.

AlphaGo Zero Takes AI To The Next Level

Computers have been beating humans at chess since IBM's Deep Blue defeated chess great Garry Kasparov in 1997.

Go is considered harder for computers since there are many more moves to be considered even from the start.

Until the original AlphaGo defeated European Go champion Fan Hui in 2015, no AI had yet beat a highly-ranked human player on a standard-sized board.

AlphaGo Zero pushed Google’s success further. It defeated the original AlphaGo system as well as the number one human player in the world, Ke Jie. Zero made several large technological leaps forward to achieve this.

Scientists were most intrigued by Zero’s self training. Unlike earlier systems which were given recorded games to study, Zero started only with the rules of Go. It reached a world champion level of play entirely through reinforcement learning.

Though perhaps less exciting, it’s important to note that Zero was also less resource-intensive than other AlphaGo systems.

The original AlphaGo used a “policy network” to select the next move and a ”value network” to predict the winner of the game from each position.

Zero combined these two, using a single network. It was able to do this with only 4 tensor processing unit (TPUs).

For comparison, the first AlphaGo used 176 TPUs and the previous system used 48.

Despite using fewer resources and having to teach itself Go strategy from scratch, AlphaGo Zero matured incredibly fast.

It took only three days of self-play to reach world champion level.

The Implications For Enterprise

What does this mean for business? The minds behind AlphaGo Zero put it most eloquently: “Our results comprehensively demonstrate that a pure reinforcement learning approach is fully feasible, even in the most challenging of domains.”

Machine learning algorithms needs to be trained to work. Traditionally human scientists supply labelled datasets to guide algorithms in their development.

It’s often tedious, expensive, or even impossible to supply training data for a specific situation, though.

Pure reinforcement learning could open a whole new universe of AI applications.

  • Marketing: Right now humans rate, code, and train marketing algorithms, but with reinforcement learning those algorithms could game out marketing strategies alone and supply targeted direction.
  • Healthcare: Machine learning algorithms show promise in detecting disease and risk factors using eye and skin scans. So many factors are involved that humans would have a hard time building a training set. Reinforcement learning could improve algorithms as well as increasing human understanding of how diseases present.
  • Manufacturing: No single corporate operating policy can be ideal for every situation. Reinforcement learning algorithms could be set loose on each individual factory or plant to optimize routines, achieving the maximum output for each set of circumstances.
  • Hospitality: Predicting human behavior is even hard for humans. Systems that learn based solely on whether guests are happy at the end of their visit could help those in the hospitality industry provide the best service in the most efficient manner.

What Comes Next

AlphaGo Zero is evidence that working towards general-use deep learning algorithms using reinforcement learning is a realistic approach.

Both the Chinese Go program Fine Art and its Japanese counterpart Zen have been able to duplicate Zero’s results (though neither has surpassed them), proving that Zero isn’t a fluke.

Google is shifting focus from AlphaGo Zero to putting lessons learned from its development into practical use.

It will be exciting to see what they make of this incredible breakthrough for artificial intelligence.


While pure reinforcement learning remains in development, there are hundreds of sophisticated enterprise analytics programs on the market. Schedule a free consultation to discover the right business intelligence solutions to expand your business and explore how to unify them in one easily accessible place.

Related Articles

No more posts to show.

A Faster Way Forward Starts Here