Reinforcement learning the winner in latest AI exploits

The world's top-ranked Go player Ke Jie took on AlphaGo, an artifical intelligence program, in May 2017. After his loss, he acknowledged the machine's deeper understanding of the board game. - PHOTO: AP

Twenty years after a computer, IBM’s Deep Blue, beat a reigning chess world champion, Garry Kasparov, for the first time, machines have made another extraordinary breakthrough.

DeepMind, a London-based alternative intelligence project and Google subsidiary, published a paper in December 2017 that outlined how its latest artificial intelligence (AI) program called AlphaZero has beaten Stockfish, the strongest chess software, which is able to calculate 70 million moves per second and outplays the best human players.

This in itself would not be remarkable, given that in 2016 one of DeepMind’s AI engines had already beaten 18-time World Champion Lee Se-dol at Go, a game that combines intuition and logic and is considered to be more complex than chess.

In October, DeepMind then presented AlphaGo Zero, a self-learning Go-playing predecessor of AlphaZero, which thrashed by 100 games to nil the program that had beaten the best human Go player.

What makes the latest achievement so significant is that AlphaZero was never programmed to play chess. The neural net was taught the rules of the game but, unlike DeepMind’s first Go program, had not been given any opening, endgame or match databases. AlphaZero learned its chess strategy simply from playing the game against itself.

After only eight hours of so-called reinforcement learning, the AI program outclassed the the best chess programStockfish program in 100 matches by winning 28 matches, drawing the remainder and not losing a single one.

In other words, after only a few hours of studying the game, AlphaZero was able to exceed or completely demolish, depending on the viewpoint, 1,500 years of accumulated chess knowledge.

“By not using human data – by not using human expertise in any fashion – we’ve actually removed the constraints of human knowledge,” said AlphaGo Zero’s lead programmer, David Silver, at a press conference. “It’s therefore able to create knowledge itself from first principles; from a blank slate …. This enables it to be much more powerful than previous versions.”

For chess players, the AI’s learning progress showed how AlphaZero initially favored certain opening and defense techniques, like the Ruy Lopez or the Caro-Kann defense, only to discard them later as unsuccessful. In turn, it started to prefer some of the best known human strategies, the English and Queen’s Gambit opening and the Berlin defense, as more promising.

The results are likely to change the strategies of the world’s best chess players, just like they did for the game of Go, but there are also real implications for the wider uses of AI.

Although critics have pointed to the differences in hardware during the matches – AlphaZero was powered by a supercomputer, while Stockfish was not – the results may have wider implications, nonetheless.

It was never the goal of DeepMind to create AI programs that would beat humans or computer programs at board games. The company’s objective is to create an intelligent machine that can tackle a broad range of challenges.

According to DeepMind, the results could bring the company closer to creating general-purpose algorithms that can help solve some of the most complex problems in science.

“If similar techniques can be applied to other structured problems, such as protein folding, reducing energy consumption or searching for revolutionary new materials, the resulting breakthroughs have the potential to positively impact society,” co-founder Demis Hassabis says on the company website.

He concedes that it is still early days. “We’re trying to build general purpose algorithms and this is just one step towards that but it’s an exciting step.”

Scientific problems are, of course, very different from the controlled environment of complete information-based and easily simulated board games with few basic rules such as chess, go or backgammon.

But AlphaZero refutes one of the criticisms of AI, that the recent highlights were simply the result of increasing computer power and the ability of analyzing ever larger datasets more quickly. Because AlphaZero is self-taught, it does not depend on large stacks of data.

It shows that the fine-tuning of the algorithms can produce significant gains in machine learning and it indicates that the field of reinforcement learning, a subset of artificial intelligence, can potentially reap greater rewards than supervised learning where software programs learn from datasets and human intervention.

Satinder Singh, a computer science professor who wrote an accompanying article on DeepMind’s research in Nature, told The Verge that in the past years, reinforcement learning has started to have a broader impact in the wider world.

“The fact that [DeepMind] were able to build a better Go player here with an order of magnitude less data, computation, and time, using just straight reinforcement learning – it’s a pretty big achievement. And because reinforcement learning is such a big slice of AI, it’s a big step forward in general,” Singh said.

Deep learning and reinforcement learning

Researchers hope that deep learning and reinforcement learning using neural nets are a way for machines to master skills that would be too complex to put into the code of a software program. The method has already improved machine-based language translation and is now applied to making financial investment, medical diagnosis and other decisions.

The general concept of deep learning is three decades old. It was known for even longer that nets of several layers of interconnected artificial neurons were in theory able to solve certain problems, but neither the hardware nor the techniques for training neural nets existed.

In 1986, David Rumelhart and Ronald Williams published a paper on backpropagation, a process that enables the individual neurons in a neural net to produce a desired output. It, thus, developed a method to train a deep neural net, but it took again much longer before the computer power needed was sufficiently advanced to produce first results.

Neural nets mimic the neurons of the brain. In a simplified way, they can be imagined as the layers of a sandwich, in which each of the layers represents thousands of artificial neurons. In an array that simulates the synapses of the brain, these artificial neurons are simple computational nodes, which each include a number value for their level of “agitation” and a number value for the strength at which this “excitement” should be passed on to a connected neuron.

To train the neural net to recognize an image, for example, the input layer would be made up of one neuron for each of the image’s pixels and the value of each neuron would represent the brightness of the corresponding pixel.

The input layer is then connected to another layer of several thousand or more neurons, which itself is connected to the next layer and so on. Depending on what type of images the machine is looking for, for example, images of horses, the final output layer in this example would only two neurons to express that the image either is or is not a horse.

Supervised learning trains a neural net by feeding it huge amounts of data, in this case images, that have the correct output value (a horse, not a horse) and teach the machine to learn backward from the correct image to recognize other similar images. It will do so without any information about the characteristics of what a horse looks like (such as four legs, long neck, mane etc.).

Initially, the neural net will have random connection values, also called weights, between each neuron. The objective of the learning process is to change the stage of excitement between the neurons of each layer to arrive at the correct level of excitement at the output layer.

Backpropagation determines where the weights of the neurons have contributed to a positive or negative output and changes the weights from the output back through the network.

When this is done millions of times, the neural net can effectively learn and its error rate diminishes over time by reorganizing itself to most effectively express the desired output.

Crucially, such a machine becomes not only a representation of images but of ideas and complex concepts that as in the case of AlphaZero’s board game strategy, may have been imperceptible to humans.

While the machine is therefore capable of excellent fuzzy pattern recognition, it is typically dependent on large amounts of data and not by itself a complex intelligence. Introducing noise, or irrelevant data, that is simply disregarded by a human, for instance, can easily fool the AI.

Now, AlphaZero has produced remarkable machine learning results without relying on human input.

Rather than relying on human generated data with the correct answer key, the reinforcement learning algorithm of AlphaZero is based on experience and the data it collects from the trial and error of self-training. This might widen its potential applications but what these applications are is not clear yet.


Despite massive expectations for the use artificial intelligence, few businesses have so far implemented any form of machine learning. A survey of 3,000 business executives by MIT Sloan Business Review last year showed that the gap between ambition and execution at most companies is very large.

While three-quarters of executives believe AI will break new ground for their business and 85 percent think it will enable their companies to obtain or sustain a competitive advantage, only one five companies have attempted to incorporate this type of technology in their services or processes.

And only 5 percent of the surveyed companies have used AI extensively.

Many companies also misunderstand the data-dependency of most machine learning projects to produce viable results that can be leveraged by a business.

Jacob Spoelstra, director of data science at Microsoft, observes in the report: “I think there’s still a pretty low maturity level in terms of people’s understanding of what can be done through machine learning. A mistake we often see is that organizations don’t have the historical data required for the algorithms to extract patterns for robust predictions. For example, they’ll bring us in to build a predictive maintenance solution for them, and then we’ll find out that there are very few, if any, recorded failures. They expect AI to predict when there will be a failure, even though there are no examples to learn from.”

Another problem for adoption of AI will be whether business managers, patients, investors or consumers have sufficient confidence in the automated decisions that machines are going to make, especially when the reasons for taking a specific course of action over another are often unfathomable.

In contrast to developers of mathematical models, the creators of deep learning machines cannot fully explain how they work. The artificial intelligence is largely a black box that resembles intuition more than reasoning.

The delegation of important decision-making tasks concerning health, defense, education, finance and other fields to machines may be an uncomfortable idea for most people. Not knowing how a machine arrived at its conclusion or decision will do little to alleviate the unease.

As a result, AI engineers have recognized that “explainability” or “interpretability” is a key feature needed to create trust in human-machine interactions.

Ultimately, the results produced by machine learning promise to improve human thinking and insight into complex system. After the world number one Go player Ke Jie played, and lost to, Alpha Go at the Future of Go Summit in Wuzhen, China, he recorded a winning streak of 20 matches.

“After my match against AlphaGo, I fundamentally reconsidered the game, and now I can see that this reflection has helped me greatly,” he said last July. “I hope all Go players can contemplate AlphaGo’s understanding of the game and style of thinking, all of which is deeply meaningful. Although I lost, I discovered that the possibilities of Go are immense and that the game has continued to progress.”