1. OCAU Merchandise now available! Check out our 20th Anniversary Mugs, Classic Logo Shirts and much more! Discussion here.
    Dismiss Notice

Using duplicated statistical data to increase the accuracy of a result?

Discussion in 'Science' started by elementalelf, Sep 23, 2012.

  1. elementalelf

    elementalelf Member

    Feb 11, 2005
    Newcastle, warnersbay
    Just something I thought of today, I'm a professional poker player so this makes a huge impact on my job if I'm able to use it.

    In online poker you can get software to record player statistics, i.e. How often they have bet/fold/raised etc.

    What I want to do is get a sample over a day, lets say 1000 hands, then find out how something affects my profitability.

    I'm worried that a sample size of 1000 will be far too few to gain any accurate information if I just graph:


    However if I use:

    This should give me a pretty accurate idea shouldn't it?

    I want to find potential problems based on short term changes, so I can't just get multiple days worth of hands as I would be adding in a huge amount of variables that I can't use, i.e. mood, amount of tables up, etc. etc.
  2. HobartTas

    HobartTas Member

    Jun 22, 2006

    There are a couple of methods you can use, the first is probably not relevent but I'll give it to you anyway as you may use it for something else. Its called bootstrapping and if you take the case of say 1000 university entrance scores of which you were given a hundred of these at random and you were then asked;

    Given the 100 scores you were provided with what estimate can you make of the distribution of the other 900 scores you don't know anything about?

    AFAIK bootstrapping is basically taking at random say 10 of these scores out of the hundred provided and doing this say a million times or so and then working out the distribution based on those million samples of 10 choices. The math is complex but have a read of the Scientific American article about this in the May 1983 issue, kindly note that in this particular instance each score is a seperate person and there is no connection or correlation with the other 999 individuals and their scores.

    http://web.cecs.pdx.edu/~cgshirl/Do...ive Methods in Statistics Sci Am May 1983.pdf

    Now to get back on track with what you asked for, basically what you need to do is calculate the Hurst Exponent which looks at subdivided groups of data like you suggested, due to the fact that sometimes you can't tell if there is any weekly or monthly trends (or any other length for that matter) such as for stockmarkets etc, etc, you also tend to analyze other lengths, you have chosen a length of 100 (1-100,101-200,201-300 and also 1-100,2-101,3-102) you would also choose lengths such as 99,98,97,96....2 and also 101,102,103,104......500.

    The only problem I can imagine you might have is this analysis tends to show up the underlying trends that I think are assumed to be consistent and unchanging throughout the entire sample period (in your case 1000) whereas you stated that you are assuming that these may change with the passage of time.

    Some further reading on this issue.






    Anyway other than my knowledge of the existence of these analytical methods there's not much more I can help you with, I think we can both appreciate that having a very good understanding of mathematics would be extremely useful in a situation such as this. This should put you on the right track though and I suggest you talk to someone that either has a University degree in mathematics or maybe even statistics, they may suggest other more appropriate techniques.

  3. OP

    elementalelf Member

    Feb 11, 2005
    Newcastle, warnersbay
    My brother did a pretty in depth statistics course whilst doing his Ph.D in biotech. We found that I actually have a much better instinctual understanding of statistics than him due to my experience with poker.

    I'll ask him regarding this because I figure this is a more complex theory than I'll be able to figure out on my own.
  4. Lucifers Mentor

    Lucifers Mentor Member

    Feb 10, 2003
    Depends on what you're trying to do - can you provide a bit more info, even via pm?

    Edit: But from the sound of things, going 1-100, 2-101, 3-102 won't give you usable results.
    Last edited: Sep 29, 2012
  5. karn1911

    karn1911 Member

    Sep 23, 2002
    Depending on the type of results you have; it might be a better idea to compare these data sets with each other to determine if there is a significant difference by using a comparative test.

Share This Page