Google DeepMind Researches Why Robots Kill or Cooperate

When our robot overlords arrive, will they decide to kill us or cooperate with us?

New research from DeepMind, Alphabet Inc.’s London-based artificial intelligence unit could ultimately shed light on this fundamental question.

They have been investigating the conditions in which reward-optimizing beings, whether human or robot, would choose to cooperate, rather than compete. The answer could have implications for how computer intelligence may eventually be deployed to manage complex systems such as an economy, city traffic flows, or environmental policy.

Joel Leibo, the lead author of a paper DeepMind published online Thursday, said in an e-mail that his team’s research indicates that whether agents learn to cooperate or compete depends strongly on the environment in which they operate.

While the research has no immediate real-world application, it would help DeepMind design artificial intelligence agents that can work together in environments with imperfect information. In the future, such work could help such agents navigate a world full of intelligent entities—both human and machine—whether in transport networks or stock markets.

Apples, Wolves

DeepMind’s paper describes how researchers used two different games to investigate how software agents learn to compete or cooperate.

In the first, two of these agents had to maximize the number of apples they could gather in a two-dimensional digital environment. Researchers could vary how frequently apples appeared.

The researchers found that when apples were scarce, the agents quickly learned to attack one another—zapping or “tagging” their opponent with a ray that temporarily immobilized them. When apples were abundant, the agents preferred to co-exist more peacefully.

Rather chillingly, however, the researchers found when they tried this same game with more intelligent agents that drew on larger neural networks—a kind of machine intelligence designed to mimic how certain parts of the human brain work—they would “try to tag the other agent more frequently, i.e. behave less cooperatively, no matter how we vary the scarcity of apples,” they wrote in a blog post on DeepMind’s website.

In a second game, called Wolfpack, the AI agents played wolves that had to learn to capture “prey.” Success resulted in a reward not just for the wolf making the capture, but for all wolves present within a certain radius of the capture. The more wolves present in this capture radius, the more points all the wolves would receive.

DeepMind's programmers make the AI play a game called Wolfpack to test its cooperative skills, which was also the premise of the movie 'The Hangover.' Talk about life imitating art, huh? — DeepMind's programmers make the AI play a game called Wolfpack to test its cooperative skills, which was also the premise of
the movie "The Hangover." Talk about life imitating art, huh?

In this game, the agents generally learned to cooperate. Unlike in the apple-gathering game, in Wolfpack, the more cognitively advanced the agent was, the better it learned to cooperate. The researchers postulate that this is because in the apple-gathering game, the zapping behavior was more complex—it required aiming the beam at the opponent; while in the Wolfpack game, cooperation was the more complex behavior.

The researchers speculated that because the less sophisticated artificial intelligence systems had more difficulty mastering these complex behaviors, the more simple AI couldn’t learn to use them effectively.

DeepMind, which Google purchased in 2014, is best known for having created an artificial intelligence that can beat the world’s top human players in the ancient Asian strategy game Go. In November, DeepMind announced it was working with Blizzard Entertainment Inc., the division of Activision Blizzard that makes the video game Starcraft II, to turn that game into a platform for AI research.

DeepMind AI program, AlphaGo, challenged and beat world champion Lee Se-dol in March 2016.

Leibo said that the agents used in the apple-gathering and Wolfpack experiments had no short-term memory, and as a result, could not make any inferences about the intent of the other agent. “Going forward it would be interesting to equip agents with the ability to reason about other agents’ beliefs and goals,” he said.

In the meantime, it might be wise to keep a few spare apples around.