Episode 52: One Billion Hands with Dave Thornton

Dave Thornton is one of the principals at SkillInGames and OneBillionHands, a recently-launched site with access to, well, roughly one billion hands’ worth of anonymized hole card data from real money online poker games. We talk about where these hands came from (Dave asked us to clarify that, contrary to what he may have inadvertently implied during the interview, most of the hands did not come from the partner at the data consultancy where he used to work) and the many interesting questions that can be studied with them. One early application is a more complete method of quantifying luck than has previously been available to poker players. We also discuss Mark Newhouse’s fold to Anton Morganstern’s river check-raise deep in the WSOP main event.

You can follow Dave on Twitter @dave_j_thornton and one of his partners, Jay Greenspan, @mizdflop. If you have suggestions for problems you’d like to see them research, please leave them here or email us.

The neural network project that Andrew mentioned is PokerSnowie.

Timestamps

0:30 – Hello and welcome
4:14 – Mailbag: What exactly is an angle shot?
16:21 – Strategy: A big hand and a big draw
37:08 – Interview: Dave Thornton

Strategy

$300 NLHE tournament. Blinds 100/200, Hero has 13K, Villain has 11K.

Villain raises to 450 UTG, 5 calls, Hero calls in the BB with Ac TC. I flat.

3150 in pot, flop is Ad 6c 4c. Hero checks, Villian bets 450, folds to button who calls. Hero raises to 2000, Villain calls, button folds.

7550 in pot, 8500 in effective stacks. Turn is 10. Hero bets ~3800. Villian tanks for a long time and calls.

River is off suit 7. Hero moves all in.

Edited to clarify that OneBillionHands’ hand histories are anonymized. 

9 thoughts on “Episode 52: One Billion Hands with Dave Thornton

  1. Excellent interview. One question I have related to the statistical power of the billion hand data (BHD). I’d have liked David to address what modeling and simulation work (if any) they performed to understand that once parsed, the database at their disposal is appropriately powered to yield valid results. He sort of alluded to this when he mentioned that he would widen certain criteria (i.e. exactly 100bb scenario to a range such as 100-102bb), but he didn’t mention how many hands on which conclusions were based. Thus, we are left to take his word that the number of events are sufficient to make observations.

    Also, I resent not getting a shout out by Nate as likely being among the listeners who know the back story of who proved that it is difficult to anonymize genomic data. Along those lines, if you can calculate VPIP and other HUD stats from the BHD data, how is it not possible to de-identify users by comparing those data to data in their own hand histories. If I have a db of 100,000 hands, I should be able to have hh’s of 100’s of events from a number of players, and would imagine from that I could compare to the HUD stats generated from the players in the db to out a few of them, or am I missing something?

    • Indeed, you probably know that genomic-data story better than I do.

      To be honest I’m not 100% sure of the details here, but I think that players are identified across hands only within fairly small groups. That is, a given person might have played 10,000 hands of the billion, but he will be “Mr. X” for one subset of them, “Mr. Y” for another, and so on.

      That said, I was full of questions to ask Dave, and I would have liked to get clearer on that issue but simply wanted to ask other stuff more.

  2. One question I was left with after the interview was concerning players actions based on a large sample size (but within a tight time frame). He seemed to imply that the hands came from a time period of about a year?

    Not really concerned with the exact amount of time the hands were gathered from, rather the impact ‘trends’ might have. I remember Baluga Whale mentioning that he used to 3b mercilessly knowing he could Cbet the flop and virtually print money. Then the player base caught on, and then the bluff 4 bet came into vogue as the new printing press.

    so, how would player tendencies as a group impact the stats that are gathered in the BHD? Would this type of rock-paper-scissors group think impact it significantly, or at all?

    also, good luck on monetizing this. Seems sort useful , but who will pick up the tab?

    which

    PS I thought Andrew and Nate had excellent suggestion with the datamining final tables.

  3. Very interesting one. I had to stop everything to listen to episode (and I’m a huge math/statistics donk). I can’t imagine how you guys are able find some of the interviewers for this show–I mean, where the hell did this guy come from?! Pretty nice to see how uninformed I am about what actually Is being researched and studied about the game. I hope there’s a follow up on his project a year or so from now. Keep ’em coming!

  4. I am busy catching up with old episodes of the podcast and must say, this was the most enjoyable 90 mins of podcast listening. I am a Maths teacher and poker player so I found the subject inherently interesting. I can’t begin to imagine the sort of excitement a project like this must illicit for those involved and wish I could be party to the data and analysis. Will watch the website with baited breath. Keep up the amazing work guys, love listening to the show.

Comments are closed.