Barion Pixel Binomial, Poisson and hypergeometric distributions | mathXplain
 

Contents of this Probability theory episode:

Random variable, Binomial distribution, Hypergeometric distribution, Poisson distribution, Probability, Average, Random variable with limit, Random variable without limit, Expected value, Standard deviation.

Text of slideshow

FAMOUS DISCRETE AND CONTINUOUS DISTRIBUTIONS

HERE IS A PROBLEM

We know the total number of elements: N

We know the number of defective elements: K

We only know the %, or the expected or the average value, or the probability

It is time to see how the three most important discrete distributions, namely the hypergeometric, the binomial and the Poisson distributions work.

Let's see a story for each of them.

This is in essence the story where we have 30 balls in a box and 12 of them are red.

If we take out 7 balls, what is the probability that 2 of them are red?

This is a bit different situation, because it is not exactly 12 days when crashes occur, but 12 days on average.

This Poisson will be even more exciting.

The question is the same in all three stories: what is the P(X=2) probability?

The answer, however, will be different for each story.

In the first two stories, X is the number of days when crashes occurred. In the third story, X is the number of crashes.

These two stories are similar in terms of not knowing the exact number of crashes that occur during the 30 days. We only know the expected number.

They are different, however, in terms of what X represents. In one story, it is the number of days with crashes, in the other it is the number of crashes. This is a fundamental difference.

So this λ is the expected value of the Poisson distribution.

We could take a look at the expected values of the other two distributions as well.

There are separate formulas for that.

Let's see the standard deviations, too.

There are separate formulas for this for each distribution.

And now let's see the probabilities.

On a certain road crashes occurred on 12 out of 30 days. We pick one week out of this 30-day period. What is the probability that there were 2 days with crashes on that week?

=days with crashes

The total elements are N=30 days, and the days with crashes are the bad, K=12.

The sample is n=7, and here we want k=2 days with crashes.

On a certain road, on average, crashes occur on 12 days out of 30 days. What is the probability that on a given week there are 2 days with crashes?

=days with crashes

Even with exceptionally bad luck, there cannot be more than 7 days with crashes in a week, so here

HAS A LIMIT, MAX 7.

, because we pick 7 days

days with crashes

On a certain road in a 30-day period, on average there are 12 crashes. What is the probability that on a given week there are 2 crashes?

=crashes

There can be any number of crashes - there are 12 on average in every 30 days. But who says there couldn't be 1000 crashes? So here

HAS NO LIMIT

[K1] is the expected value

[K2] the number of crashes expected to occur in a week

There are 12 crashes in 30 days, so the number of crashes per day is 12/30=0.4.

Seven times of 0.4 is 2.8, so 2.8 crashes are expected in one week.

In the previous slideshow, we started looking at the three most important discrete distributions, so now we should look at some exercises.

On average, 24 customers arrive to the bank in an hour.

What is the probability of exactly 2 people arriving within 7 minutes?
What is the probability of maximum 2 people arriving within 7 minutes?
What is the probability of minimum 2 people arriving within 5 minutes?

On average, 24 customers arrive in an hour, but this is only the average. So, it can happen that in one hour, nobody comes, and in the next, 50 customers show up. The number of customers has no limit, it could be anything. It is not likely though that in the next 7 minutes 7 billion customers come in, but who knows.

If we recall the example about road crashes: there can be a maximum of 7 days in a week when crashes occur, however, there can be 7 billion crashes in a week. So the number of customers is like the number of crashes.

So, this is a POISSON DISTRIBUTION, which means we need the expected value.

If there are 24 customers arriving every hour, then it is 24/60=0.4 per minute. In 7 minutes it is seven times that: 2.8.

We would not expect the same number of customers in a period of 5 minutes and in a period of 7 minutes, so the expected values will be different.

If there are 24 customers arriving every hour, then it is 24/60=0.4 per minute, and in 5 minutes it is five times that number, namely 2.

___ means that ____

This is a bit too much, so let's use the complementary for the calculation.