Why do self-learning artificial intelligence have problems with the real world?

Original author: Joshua Sokol
  • Transfer

The latest AI systems begin training without knowing anything about the game, and grow to a world level in a few hours. But researchers are struggling with the use of such systems outside the game world.

Until recently, machines capable of shaming human champions at least had respect for using human experience to teach games.

To defeat Garry Kasparov in chess in 1997, IBM engineers used centuries of chess wisdom to create their own Deep Blue computer. In 2016, the AlphaGo program of the Google DeepMind project defeated the champion Lee Sedola in the ancient go board game, processing millions of game positions collected from tens of thousands of games between people.

But now, AI researchers are rethinking how their bots should absorb human knowledge. The current trend can be described as "yes, and God bless him."

DeepMind team published last October details of the new go game system, AlphaGo Zero, which did not study people's games at all. She started with the rules of the game and played with herself. The first moves were completely random. After each game, she accepted new knowledge about what led to victory and what did not. After these matches, AlphaGo Zero baited the already superhuman version of AlphaGo that defeated Lee Sedol. The first won the second with a score of 100: 0.

Lee Sedol, 18-time world go champion, match against AlphaGo in 2016.

The team continued to research and created the next brilliant player in the AlphaGo family, this time simply called AlphaZero. In work , published on arxiv.org in December, DeepMind researchers revealed how, starting from scratch again, AlphaZero trained and defeated AlphaGo Zero - that is, it defeated the bot, the victorious bot, the victorious best go player in the world. And when she was given the rules for Japanese shogi chess , AlphaZero quickly learned and was able to beat the best of their specially created algorithms for this game. Experts marveled at the aggressive and unfamiliar style of the game. “I always wondered what it would be like if superior beings flew to Earth and showed us how they play chess,” said Danish grandmaster Peter Heine Nielsen in an interview with the Air Force. - Now I know".

Last year, we saw other bots from other worlds that showed themselves in such different areas as unlimited poker and Dota 2, a popular online game in which fantasy heroes fight for control of another world.

Naturally, the ambitions of companies investing money in such systems extend beyond the dominance of gaming championships. Research teams like DeepMind hope to apply similar methods to real-world tasks — creating superconductors operating at room temperature, or understanding which origami will turn proteins into molecules useful for drugs. And, of course, many practitioners hope to build a general-purpose artificial intelligence - a poorly defined, but captivating goal to give the machine the opportunity to think like a person and be flexible in solving different problems.

However, despite all the investments, it is not yet clear how far current technologies can go beyond the limits of the game board. “Not sure if the ideas behind AlphaZero will be so easy to summarize,” says Pedro Domingos, a computer scientist at the University of Washington. “Games are a very, very unusual topic.”

Ideal goals for an imperfect world

One common characteristic of many games, including chess and go - players constantly see all the chips on both sides of the board. Each player has what is called “ideal information” about the state of the game. No matter how complicated the game, you just need to think about your current position.

Many real-world situations cannot be compared with this. Imagine that we ask the computer to make a diagnosis or conduct business negotiations. “Most of the strategic interactions in the real world involve hidden information,” said Noam Brown , a computer science graduate student at Carnegie Malon University. "It seems to me that most of the AI ​​community ignores this fact."

Brown poker offers a different challenge. You do not see the opponent’s cards. But here, machines that learn through a game with themselves, already reach superhuman heights. In January 2017, the Libratus program, created by Brown and his curator Thomas Sandholm , beat four professional Texas Hold'em unlimited players by winning $ 1.7 million at the end of the 20-day championship.

An even more discouraging game with imperfect information is StarCraft II, another online multiplayer game with a huge number of fans. Players choose a team, build an army and wage war on a sci-fi landscape. But the landscape is surrounded by the fog of war, because of which players see only those parts of the territory on which their own troops or buildings are located. Even the decision to explore the territory of the opponent is full of uncertainty.

This is the only game that AI cannot win yet. Obstacles are a huge number of options for moves in the game, which usually exceeds one thousand, and the speed of decision-making. Each player - a person or a machine - has to worry about a huge number of likely development scenarios with each click of the mouse.

So far, AI cannot compete on equal terms with people in this area. But this is the goal for the development of AI. In August 2017, DeepMind teamed up with Blizzard Entertainment, the company that created StarCraft II, to create tools that they said would open this game to AI researchers.

Despite all the complexity, the goal of StarCraft II is simple to formulate: destroy the enemy. This makes her akin to chess, go, poker, Dota 2 and almost any other game. In games you can win.

From the point of view of the algorithm, tasks should have a “target function”, a goal to which we must strive. When AlphaZero played chess, it was easy. The defeat was estimated at -1, a draw at 0, a victory at +1. The objective function of AlphaZero is to maximize points. The objective function of the poker bot is just as simple: to win a lot of money.

Computer walkers can train complex behaviors like walking in unfamiliar terrain

Situations in real life are not so simple. For example, a robomobile needs a finer formation of the objective function - something similar to a neat selection of words when describing your desire for a genie. For example: quickly deliver a passenger to the right address, obeying all laws and appropriately weighing the cost of human life in dangerous and uncertain situations. Domingos says that the formation of objective function by researchers is “one of those things that distinguish a great researcher in the field of machine learning from the average.”

Consider Tay, the Twitter chatbot that Microsoft released on March 23, 2016. His goal was to engage people in the conversation, which he did. “What Tay unfortunately discovered,” said Domingos, “is that the best way to maximize people's involvement is to give out racist insults.” It was turned off just a day after the start of work.

Your own main enemy

Some things do not change. The strategies used today by the prevailing game bots were invented decades ago. “It's such a blast from the past - they just give it more computing power,” says David Duveno , an IT specialist at Tokyo University.

Strategies are often based on reinforced learning techniques with freedom of action. Instead of engaging in micromanagement, setting up the smallest details of the algorithm, engineers give the machine to study the environment to learn how to achieve goals independently, by trial and error. Prior to the release of AlphaGo and its heirs, the DeepMind team achieved their first major success in the headlines in 2013, when they used reinforcement training to create a bot that learned to play seven Atari 2600 games, and three of them at the expert level.

This progress has continued. February 5th DeepMind released IMPALA - An AI system that can learn 57 games with the Atari 2600 and another 30 levels made by DeepMind in three dimensions. The player acts on them in various environments and achieves goals such as opening doors or picking mushrooms. IMPALA seemed to transfer knowledge between tasks - time spent on one game improved results in the rest.

But in the broader category of reinforcement learning, board and multiplayer games, a more specific approach can be used. Their study can go in the form of a game with itself, when the algorithm reaches strategic superiority, repeatedly competing with a close copy of itself.

This idea is many decades old. IBM engineer Arthur Samuel created in the 1950s a program for playing checkers, which partially learned to play, competing with itself. In the 1990s, Gerald Thesauro of IBM created a backgammon program that contrasted the algorithm with itself. The program reached the level of expert people, at the same time inventing unusual, but effective game strategies.

In an increasing number of games, algorithms for playing with oneself are provided with an equal opponent. This means that changing the strategy of the game leads to a different result, due to which the algorithm receives instant feedback. “Every time you find out something, when you discover a little thing, your opponent immediately begins to use it against you,” says Ilya Sutskever , director of research at OpenAI, a nonprofit organization that he founded with Ilon Musk, dedicated to the development and dissemination of AI technologies and the direction of their development in a safe direction. In August 2017, the organization released a bot for Dota 2, which controlled one of the characters in the game, Shadow Fiend - a necromancer demon - who defeated the best players in the world in one-on-one battles. Another OpenAI project pushes people to simulate a sumo match , as a result of which they learn wrestling and tricks. During a game with oneself, “there is no time to rest, you need to constantly improve,” said Sutskever.


But the old idea of ​​playing with yourself is only one ingredient in the bots that prevail today, they still need a way to turn the gaming experience into a deeper understanding of the subject. In chess, go, video games like Dota 2, there are more permutations than there are atoms in the Universe. Even if we wait for several human lives while the AI ​​fights its shadow in virtual arenas, the machine will not be able to implement each scenario, write it in a special table and refer to it when such a situation occurs again.

To stay afloat in this sea of ​​opportunity, “you need to generalize and highlight the essence,” says Peter Abbil , IT Specialist at the University of California, Berkeley. IBM's Deep Blue did this with a built-in chess formula. Armed with the ability to evaluate the strength of game positions that she had not yet seen, the program was able to apply moves and strategies that increase her chances of winning. In recent years, a new technique makes it possible to abandon such a formula altogether. “Now, all of a sudden, this all embraces the“ deep network, ”Abbil said.

Deep neural networks, whose popularity has skyrocketed in recent years, are built from layers of artificial “neurons” layered on top of each other like a stack of pancakes. When a neuron in one of the layers is activated, it sends signals to a higher level, and there they are sent even higher, and so on.

By adjusting the connections between the layers, these networks surprisingly cope with turning the input data into the associated output, even if the connection between them seems abstract. Give them a phrase in English, and they can be trained by translating it into Turkish. Give them pictures of animal shelters and they can determine which one is for cats. Show them the game poly, and they will be able to understand the probability of winning. But usually, such networks must first provide lists of tagged examples on which they can practice.

That is why a game with itself and deep neural networks combine so well with each other. Independent games produce a huge number of scenarios, and the deep network has an almost unlimited amount of data for training. And then the neural network offers a way to learn the experience and patterns encountered during the game.

But there is a catch. For such systems to provide useful data, they need a realistic platform for games.

"All these games, all these results, were achieved under conditions that made it possible to perfectly simulate the world," said Chelsea Finn, a graduate student from Berkeley who uses AI to control robotic arms and interpret data from sensors. Other areas are not so easy to simulate.

Robomobiles, for example, have difficulty coping with bad weather or with cyclists. Or they may not perceive the unusual possibilities encountered in the real world - such as a bird flying directly into the camera. In the case of robotic arms, Finn says, the initial simulations provided basic physics that allowed the arm to learn how to learn. But they can’t cope with the details of touching different surfaces, so tasks such as twisting the bottle cap - or conducting a complex surgical operation - require experience gained in reality.

In the case of problems that are difficult to simulate, playing with yourself will no longer be so useful. “There is a big difference between a truly perfect environment model and a learned exemplary model, especially when the reality is really complex,” wrote Yoshua Benggio , a pioneer of deep learning from the University of Montreal. But AI researchers still have ways to move on.

Life after the games

It’s hard to pinpoint the start of AI superiority in games. You can choose Kasparov’s loss in chess, Li Sedol’s defeat at AlphaGo virtual hands. Another popular option would be the day of 2011 when the legendary champion of the game Jeopardy! Ken Jennings lost to IBM Watson. Watson was able to handle clues and puns. “I welcome the emergence of our new computer overlords,” Jennings wrote under his last reply.

It seemed that Watson had office skills similar to what people use to solve many real problems. He could perceive the input in English, process the documents associated with it in the blink of an eye, fetch connected pieces of information and choose one best answer. But seven years later, reality continues to pose complex obstacles to AI. A September Stat Health Agency report indicated that Watson's heir, a specialist in cancer research and personalized treatment guidelines for Watson for Oncology, had problems.

"Questions in the game Jeopardy! It’s easier to handle, because it doesn’t need common sense, ”wrote Bengio, who worked with the Watson team, in response to a request to compare these two cases in terms of AI. “Understanding a medical article is much more difficult. A large amount of basic research is required. ”

But even though the games are narrowly specialized, they resemble several real tasks. Researchers from DeepMind did not want to answer interview questions, indicating that their work on AlphaZero is currently being studied by independent experts. But the team suggested that such technology would soon be able to help biomedicine researchers who want to understand protein folding.

To do this, they need to understand how the various amino acids that make up the protein, bend and fold into a small three-dimensional machine, the functionality of which depends on its shape. This complexity is similar to the complexity of chess: chemists know laws at such a level that they can roughly calculate certain scenarios, but there are so many possible configurations that it will not work to search for all possible options. But what if protein folding is a game? And this has already been undertaken. Since 2008, hundreds of thousands of people have tried the online game Foldit , in which users are awarded points for the stability and reality of the protein structure they have rolled up. A machine could train in a similar way, perhaps trying to surpass its previous best achievement with reinforcement training.

Reinforcement learning and playing with oneself can help train interactive systems, Saskaver suggests. This can give robots that need to talk to people a chance to train in this while talking to themselves. Given that specialized equipment for AI is becoming faster and more affordable, engineers are getting more incentives to design tasks in the form of games. “I think that in the future, the importance of playing with yourself and other ways of consuming a large amount of computing power will increase,” said Satskever.

But if the final goal of the machines is to repeat everything that a person is capable of, then even the generalized champion in board games like AlphaZero still has room to grow. “I need to pay attention, at least to me, to the huge gap between real thinking, creative exploration of ideas and today's AI capabilities,” says John Tenenbaum , a cognitive scientist at MTI. "Such an intelligence exists, but so far only in the minds of the great AI researchers."

Many other researchers, sensing the hype around their area, offer their own criteria. “I would recommend not overestimating the importance of these games, for AI or for general tasks. People aren't very good at playing the game, ”says Francois Cholet, a deep learning researcher at Google. “But keep in mind that even very simple and specialized tools can achieve a lot.”