Robots Learn Deception

The Wired blog yesterday reported on a recent experiment in which robots “learned” deception autonomously, without specific instructions from their programmers:

Two robots — one black and one red — were taught to play hide and seek. The black, hider, robot chose from three different hiding places, and the red, seeker, robot had to find him using clues left by knocked-over coloured markers positioned along the paths to the hiding places.

However, unbeknownst to the poor red seeker, the black robot had a trick up its sleeve. Once it had passed the coloured markers, it shifted direction and hid in an entirely different location, leaving behind it a false trail that managed to fool the red robot in 75 percent of the 20 trials that the researchers ran. The five failed trails resulted from the black robots’ difficulty in knocking over the correct markers.

The significant thing here is that the robots weren’t programmed to use a deceptive strategy. They “evolved” it on their own through a process resembling natural selection:

The robots — soccer ball-sized assemblages of wheels, sensors and flashing light signals, coordinated by a digital neural network — were placed by their designers in an arena, with paper discs signifying “food” and “poison” at opposite ends. Finding and staying beside the food earned the robots points.

At first, the robots moved and emitted light randomly. But their innocence didn’t last. After each iteration of the trial, researchers picked the most successful robots, copied their digital brains and used them to program a new robot generation, with a dash of random change thrown in for mutation.

Soon the robots learned to follow the signals of others who’d gathered at the food. But there wasn’t enough space for all of them to feed, and the robots bumped and jostled for position. As before, only a few made it through the bottleneck of selection. And before long, they’d evolved to mute their signals, thus concealing their location.

It seems more and more asinine to me every time I read about how computers will never be able to play poker as well as humans. The argument generally boils down to a claim that people are capable of higher-level, holistic reasoning that enables them to vary their play and also to pick up on exploitable tendencies of opponents better than a computer could.

It seems to me that especially with regard to online poker, where there aren’t so many subtle physical tells for a human to pick up on, a computer is in theory just as capable as a human of picking up all the relevant information. Why can’t a computer learn to recognize and adapt to signs of tilt in an opponent: quickened response time, increase in number and size of pots played, increased WTSD%, one or more large pots recently lost? Over time, why can’t a computer develop a profile of an opponent that recognizes that he is more or less likely to bluff after having recently failed in a big bluff? And surely computers are far more capable of randomness than are mere humans.

The really important thing is that this doesn’t all have to be programmed into the computer initially. The computer just needs to learn to collect all of the relevant data, i.e. not just individual betting lines but also things like timing, bet sizing, etc. Then it needs a way of searching for patterns in that data to make predictions that a player tends to open up his raising range after a few orbits of tight play or tends to bluff after losing a large pot. I really don’t see any reason why a computer couldn’t be better at all of this than a human player.

Of course if the human knows all of the things that the computer looks for, then he may be able to stay one step ahead. But that’s a pretty unfair advantage and would probably be true in a human vs. human match as well.

3 thoughts on “Robots Learn Deception”

  1. Andrew –
    Have you followed the literature on poker bots at all? It’s pretty cool…I can send you some papers if you want. Basically there are a couple of approaches – one approach tries to simplify the game into something manageable and then solve an equilibrium (as an example of how to simplify: assume AcAs is the same as AcAh preflop…not all of the assumptions are that innocuous though). Another approach does something more similar to the one in this article, starting out with some naive strategy and then adapting based on experience (there are a few different methods for doing this, but that general method describes all of them). Some of them have done quite well against “experts” (OK, it was Phil Laak, but still…).

    • I have a friend who was at the first Alberta conference where Laak and the other guy played HU vs. the AI. The friend is a computer science guy, not a poker guy, but he filled me in on it at the time. I am interested in the subject, but I doubt my ability to follow along with a technical paper on the subject. I’d be interested to read more if you have something that you think would be intelligible to a lay person.

  2. All I know is that I wish I were still coaching debate, because I’d be totally cutting this card to add to the Robot Wars DA.

Comments are closed.