Wednesday, November 30, 2016

Check your lotto chances: what do the numbers say?

This post is a part 2 of using probability to determine our chances to win the lottery. In the previous post we checked the probability of drawing the numbers.

Oh the lottery win, could it be? We have determined that our chances are very slim, but we also know that the lotto people have to use a machine or a bunch of balls in a rolling drum to get the numbers. Maybe there could be an issue with the balls or the machine? or the environment its in? Maybe its not really random? Maybe there will be some patterns to how they get drawn.

Lets get all the numbers from previous draws and see what comes out of that. In Canada the lottery published the list of winning numbers, so we are going to get the files, and find out if there is (just maybe) a way to predict the winning numbers. Maybe one number is more likely to be drawn then the others.

To find out, lets load up the numbers from a CSV file and find out whats there:

import os, csv
from collections import Counter
class LottoStats(object):
    def __init__(self, file_name='Lotto649.csv'):
        We want to get the numbers out of a spreadsheet, its colums are like so:
        Draw Date,No1,No2,No3,No4,No5,No6,Bonus,
        lotto_file = open(os.path.abspath(file_name), 'rb')
        reader = csv.reader(lotto_file, dialect = csv.excel)
        self.lotto_numbers = [] 
        self.draws = 0
        for row in reader:  
            self.draws += 1 
            for i, cell in enumerate(row):
                if (i != 0 and i != 7 and cell):
        Python has the handy Counter object that 
        will return the elements and their counts
        self.counts = Counter(self.lotto_numbers)
    def get_numbers(self):
        return self.lotto_numbers

    def get_counts(self):
        return self.counts.items()

    def count_most_common(self, n=9):
        return self.counts.most_common(n)

    def count_least_common(self, n=9):
        return self.counts.most_common(len(self.lotto_numbers))[:-n-1:-1]

    def get_draws(self):
        return self.draws

stats = LottoStats('Lotto649.csv') 
# lets validate the expected counts with a test
assert len(stats.get_counts()) == 49

print stats.get_draws() # 2950
print stats.count_most_common(6) #[('34', 406), ('31', 405), ('45', 397), ('43', 394), ('40', 393), ('47', 386)]
print stats.count_least_common(6) #[('28', 331), ('14', 333), ('15', 339), ('16', 340), ('22', 341), ('13', 342)]

counts = stats.get_counts()
occurances = [occ[1] for occ in counts]

From our observation by just checking the common and least common, we see that some numbers seem to be drawn more than others. Is that a big deal?

From the occurrences, we have the number 34 appearing 406 times, and we have 28 in there 331 times. That seems like it could be a big spread, but remember that we had 2950 draws.

We need to determine the chances that the number can be drawn at all. Out of 2950 draws, it's a 406/2950 or .1376 or 13% chance of happening and 28 has .1122.

Whats a good tool to add some insight here? What does this data tell us? There are two really valuable statistical tools to give us some understanding.

One is sample standard deviation which just indicates how far apart the values are. This isn't really useful on its own, but because our data has the properties it does it can be combined with the average to find.the coefficient of variance, which tells us how close all the values are to the average.

A set like [10,10,10] has a cv of 0; there is no difference. What does the set of our probabilities tell us?

from stats import standard_deviation

num_chances = [(num / float(2950)) for num in occurances]
print (num_chances)
sd = standard_deviation(num_chances, False) #The standard deviation is 0.00614129698134.
mean = sum(num_chances) / len(num_chances)
print("Mean "+str(mean))
variance = sd / float(mean) 
print("coefficient of variance: "+str(variance)) #0.0501539253476

The standard deviation is 0.00614129698134, and the Mean (or average) is 0.122448979592, so the coefficient of variance: 0.0501539253476

The 0.05 is extremely low, getting close to 0 and that tells us the values of the numbers are all very, very close together. This means the actual drawing of the numbers seems to have no meaningful effect on the odds. It pretty much a random draw, so we can conclude that the spinning drum with the balls flying around do a pretty good job.


So we have described the situation, so lets try some Predictive or Prescriptive analytics on this issue.

Can we predict the lottery? With the long odds, and the measured fairness of the process, it's just not accurately predictable enough to pursue specific numbers. The prescriptive then follows that you just shouldn't buy a lottery ticket expecting to make money doing it. But what are the odds of winning over 10 or 20 years, even just once? They are the same each week, they don't change.

Statistics tell a story about data. There has to be some understanding about the data itself, and
the types of questions you have understood to apply the methods and tools of stats properly.


There is a way to make money from the lottery. Let's calculate the cost if played over 10 years, every week playing 100 draws a year. I think its $2. Lets say I bought an investment that gave 5% every year.

year_cost = 2 * 52 * 2 # $208 per year
total_gain = 0
for y in range(10):
    total_gain += year_cost * 1.05

If you kept that under your mattress,  you would have $2080 or $2184 if you bought the investment and the best part is that you have 100% chance of getting that money, instead of the 0.00000000715112% chance of the lotto.

Maybe think of it this way: what are the chances that you will need a spare $2000 in the future?

No comments:

Post a Comment