Battling DeepStack

Nate and I had the great privilege of participating in the premiere broadcast of a series of matches between DeepStack, a state-of-the-art heads up no-limit hold ’em Artificial Intelligence, and human professionals. We found DeepStack to be a really tough competitor that left us questioning our play in both large and small pots I’m sure we didn’t play nearly as well as heads up specialists would have, but it was great fun to try, and hopefully we did a good job of sharing the experience with the audience on Twitch. If you missed it, here’s a link to the replay!

.Next week, Terrence Chan and Adam Schwartz of the 2+2 Pokercast will play DeepStack. I wanted to share some of my thoughts from the match with both them and the Thinking Poker community anyway, so I figure I might as well just collect my thoughts here.

  1. Bet Sizes. I haven’t discussed this with the Computer Poker Research Group, but it seems like there are only a few bet sizes that DeepStack considers for its own actions (though, as I understand it, its ability to respond to diverse bet sizes is one of its chief advances over previous NLHE AIs). For instance, into a pot of 1600, it might bet 800, 1600, or 3200, but it would never choose 2291 as a bet size unless that were its exact stack size.

    This strikes me as the best opportunity to exploit DeepStack, though Terrence and Adam are probably more capable than I of determining how exactly to take advantage of that (it wasn’t something I actively tried to do during my match). Considering the range of bet sizes DeepStack does use, I suspect that generally it doesn’t lose much by not considering “weirder” amounts. However, this might be somewhat more problematic with shallow stacks, where never betting less than half pot (if that is even a constraint) might prevent it from having a bet-folding range at all.

  2. Threat of a Check-Raise. These were the spots where I felt I had the most difficulty setting aside my “feel” based on how human opponents tend to play and constructing minimally exploitable ranges. There are a lot of spots where (non-elite) human opponents don’t check-raise often. This is for a variety of reasons: lack of “obvious” bluffing candidates, difficulty of checking a strong hand multiple times, etc. As a result, I think I ended up with betting ranges that were sometimes too depolarized (getting raised off of strong draws or very-possibly-best made hands sucks) or simply too wide.

    Example: There was one hand where I turned 84 into a bluff on AJ2Q4, and it check-raised me with 85o!

  3. Board Coverage. Nate and I talked a bit about this on stream. This is something you see when working with solvers as well, and is probably related to (2). There are subtle things that DeepStack seems to do when making what might seem like arbitrary choices about candidates for floating or bluffing on early streets. The end result is a less predictable range on future streets.

    For instance, I know that I want to have some Kx in my three-betting range when deep, and I typically choose some combination of KTs – KAs for this purpose. DeepStack almost certainly does a better job of getting the exact frequency right, but even we miraculously had the same amount of Kx in our three-betting ranges, it probably builds its range by three-betting all combinations of Kxs at relatively low frequencies. This means it ends up connecting with boards like Q74 in three-bet pots in ways that I don’t. Likewise its candidates for peeling or bluff-raising flop can seem surprising when the truth is that the choice is arbitrary in a vacuum but there is incentive to reach turns and rivers with a wider variety of holdings than most humans do. Consequently, it’s harder (though still not impossible) to recognize a particular run out as good or bad for DeepStack based on its play on earlier streets.

  4. Surprising Play. DeepStack did more than a few things that surprised us. For the most part, we were willing to believe that it “knew” better and could, after the fact, wrap our heads around why it may have done what it did. But it made one play against me that I have a really hard time believing could possibly be correct.

    At 200/400, I opened to 1200 with QTo, and DeepStack jammed 18,250 effective with 85o. When we’re talking about move all in pre-flop, board coverage isn’t going to be a consideration, and although shoving ranges won’t be strictly linear because there will exist hands where calling > shoving > folding, it’s hard to imagine how folding could ever be correct if shoving 85o is +EV here.

    It’s worth adding here that one feature of an equilibrium strategy is that it will not include “advertising” or “balancing” plays, even at a low-frequency, that have a negative expected value. Now admittedly, DeepStack does not claim to have an equilibrium strategy, but the point is that shoving, even at a low frequency, can’t be justified simply by saying it’s a balancing play. It would have to have EV not less than 0 for shoving to be correct at any non-zero frequency.

14 thoughts on “Battling DeepStack

  1. > It’s worth adding here that one feature of an equilibrium strategy is that it will not include “advertising” or “balancing” plays…It would have to have EV not less than 0 for shoving to be correct at any non-zero frequency.

    I’m committing this sanity check to memory.
    —-
    In the stream I asked about varying preflop raise sizes vs. DeepStack. I’ve got a couple followup questions about exploit opportunities made possible by the absence of sizing tells:

    1. Do you have an idea for DS’s range when facing an open shove pre-flop? Open shoving the top of my range was the first potential (admittedly unsophisticated) exploit I thought about. While point 4 above is the reciprocal, it does make the idea more intriguing.

    2. For all hands DS would open raise pre, would it be more likely to flat or 3-bet a min raise? If it always narrows its range when 3-betting, I wonder if there would be value in controlling the pot preflop with min raises.

    • David,

      Short answer is that an equilibrium strategy (Which again, is not necessarily what DS actually plays, but is what it aspires to) will not enable you to do any of these things. Even if it doesn’t *exploit* sizing tells, it also doesn’t “reward” an opponent for taking a particular line. For example, if you open shove your strongest hands, it will call with some hands you beat in order to prevent you from profitably making the same play with a weaker hand. However, you will no longer have those strong hands in your range when you take other lines, and even though DS doesn’t “know” that, it does play in a way that will profit from your failure to show up with those hands. And many of the hands that will call off vs your shove are hands that would have put lots of money into the pot regardless of the line you took.

      • Boom! That makes a lot of sense. I’ll try to use that line of thinking to jump start reasoning about my min raise question.

        Missed mentioning it on the first post, but the stream was *awesome*.

  2. EDIT: I think my #2 would be more valuable as a means to cap DS’s range (flatting) or expose strength (3-betting), rather than preflop pot control. Granted, the entire idea is likely reductive.

  3. The thing about (near-)optimal poker is that mixed strategies can mean even the craziest looking plays can be part of the GTO solution **if they are done at low enough frequencies**.
    When they are building pre-flop ranges, humans will say something like “I want to have about 12 bluffing combos to balance my value hands in this spot. I think I’ll use KQs, A5s, A4s”, but a computer can make up the ‘equivalent’ of 12 bluff combos, by picking hands from all over the matrix (hence the “board coverage” concept), such that it might bluff with KQs 25%, A5s 25%, A4s 25%, AJo 25%, 72o 5%, 85o 2%, 93o 2% etc etc… until the [frequencies * combos] adds up to the optimal amount.

    With 85o facing a 3x open, the 45bb 3-bet jam is likely done at a *very* low frequency, ‘balanced’ by monsters in such a polarized way that 85o becomes at least breakeven, due to the huge amount of fold equity. Libratus made similar plays with total air. You can’t call off profitably with QT when it’s going to show up with AA/AKs a lot more often than 85.

    • @ArtyMcFly – I’m struggling with some of these points, and would be interested in your feedback:

      1. From Andrew “But it made one play against me” – if it only did this once vs. Andrew (possibly only Andrew), and its frequency for this play is very low, it seems like a candidate for Bayes Theorem (how I understand it anyways). Basically, what’s greater: DS’s error rate, or the low frequency you’re assigning to this type of play? I’m inclined to believe the former. If the frequency is not as low as I’m presuming, why did we not see any other similar plays (DS always showed its cards)?

      2. “due to the huge amount of fold equity” – my mind went here first too, but 18,250 is 11.5x a pot of 1,600. Andrew’s calling range doesn’t have to be very wide to have an edge.

      3. I can imagine an “advertisement” play being profitable exploitively, but you are both talking about a pure GTO/equilibrium strategy. Andrew is claiming an axiom for that style of play. If you have a source that offers an alternative, I’d earnestly be interested in checking it out.

    • Arty, if I understand what you’re saying here, I believe I already addressed these arguments in the original post. The choice of “bluffs” when you jam 45BBs is not arbitrary, as some will have better equity when called than others. And an equilibrium strategy will not involve deliberately choosing an option with a lower EV than another option (and folding is always a 0EV option, so there are no -EV plays in an equilibrium strategy). So either jamming 85o is not -EV, in which case jamming even slightly stronger hands in the same spot would be +EV and something DS would always prefer to folding those hands, or jamming 85o is -EV and not something DS “intends” to be doing.

  4. My understanding is that part of DeepStack’s strategy is derived from a neural net. If that is the case then it is going to have some strange behaviors if they didn’t go through and explicitly filter them.

    It is learning how to play from scratch. Maybe it ‘got lucky’ with some strange plays and then didn’t get back to similar situations often enough to unlearn the mistake.

  5. It was possibly just a sample-size glitch, like the type Rant described, but I presume Deepstack is “range-splitting”, or whatever the cool kids call it. (I assume that DeepStack is different to Snowie in this regard. Snowie picks one bet-size that it thinks is best for its range as a whole, but I’ve been persuaded that that is not actually optimal, and a smidgeon of additional EV can be gained by mixing sizes as well as frequencies). If DS has the ability to use two raise sizes (e.g. a pot-sized 3-bet and a shove), total EV will be maximised by putting various combos into each range at different frequencies, such that each range/size is balanced/unexploitable and max EV.
    I presume it’s not often jamming some of the better hands than 85, even though they would have a higher EV than 85 itself as jams, because they have an even higher EV as calls or standard 3-bets. In short, I suspect DS has a balanced range for making “standard 3-bets” and it also has a balanced (but much more polarized) range for jamming, and some combos (e.g. AA) appear in both ranges, whereas some only appear in one or the other (e.g. A5s likely does best as a call or a standard 3-bet, and doesn’t make quite as much money as a shove). 85o wouldn’t fit into a calling or small 3b range as neither of those lines is +EV, but it fits into the super-polarized shoving range at a low frequency.
    It would be really interesting to know exactly what DeepStack’s strategy looked like in that spot, to find out which hands it jams, which it 3-bets or calls, and at what frequencies. I’d expect there to be a whole lot of mixing going on, and I think you just happened to find one of the very rare spots where the RNG landed on “jam 85o”.

  6. > I’d expect there to be a whole lot of mixing going on, and I think you just happened to find one of the very rare spots where the RNG landed on “jam 85o”.

    You are absolutely right, when we analyzed that hand, the probability of the action was only about 1%. Since DeepStack played few hundreds of hands during the Twitch stream, it’s not surprising to see one or two actions with such a small prob due to noise/under-convergence.

    • Thanks for commenting, Martin. I think, though, that you may actually be saying something slightly different than Arty. If I understand correctly, Arty is suggesting this might just be one of many hands with which an optimal strategy would raise all in at some frequency. I think what you are saying is that, if DeepStack were to optimize its strategy perfectly, this is not a hand that it would play this way. The fact that it did so during the match is essentially just noise as a result of imperfect convergence?

  7. > I think what you are saying is that, if DeepStack were to optimize its strategy perfectly, this is not a hand that it would play this way. The fact that it did so during the match is essentially just noise as a result of imperfect convergence?

    Yes – interestingly, there is no action that DeepStack plays with zero probability, so it is not surprising that if you play a lot hands, you will see few “weird” actions. Overall, since such actions are played with a very small probability, it is not costly.

  8. “There is no action that DeepStack plays with zero probability”.
    Wow. It’s pretty amazing that even though it hasn’t (yet) completely ruled out some pre-flop options, it still manages to crush souls.
    Thanks for providing the “about 1%” information, Martin. I wish I could get my error rate as low as that!

Comments are closed.