Bayesian Statistics 101: Inference of population proportions

This is the first part of a collection of snippets written in python and pymc that show the Bayesian counterparts of classical Frequentist methods typically taught in a first statistics class.

Inference on a population proportion
In the following snippet we estimate the distribution of the probability of success p for a Binomial Random Variable, given a sequence of number of successes for N trials. An uninformative Beta distribution is used as prior for the unknown parameter p whereas the Binomial distribution represents the likelihood, in Bayesian parlance.

n = 8
data = [4, 5, 6, 5, 5, 4, 3]

p = Beta("p", alpha=1, beta=1)
y = Binomial("y", n=n, p=p, value=data, observed=True)

model = Model([y, p])
mcmc = MCMC(model)
mcmc.sample(40000, 10000, 1)

pm.Matplot.plot(mcmc)

(PNG Image, 598 × 368 pixels)

Note that the dotted lines in the graph on the right side delimit the 95% credible interval. Stackexchange has a wonderful explanation on the difference between a confidence interval and a credible interval.

Inference on the difference of two population proportions

The difference of two population proportions comes in handy when we want to ascertain if a proportion is, statistically speaking, different from another. If the credible interval doesn’t contain 0, then we can make the claim that the proportions are different.

n1 = 8
data1 = [4, 5, 6, 5, 5, 4, 3]

n2 = 20
data2 = [16, 15, 17, 11, 18, 16, 18]

p1 = Beta("p1", alpha=1, beta=1)
y1 = Binomial("y1", n=n1, p=p1, value=data1, observed=True)

p2 = Beta("p2", alpha=1, beta=1)
y2 = Binomial("y2", n=n2, p=p2, value=data2, observed=True)

@deterministic
def diff(p1=p1, p2=p2):
    return p2 - p1

model = pm.Model([p1, y1, p2, y2, diff])
mcmc = pm.MCMC(model)
mcmc.sample(40000, 10000, 1)

pm.Matplot.plot(mcmc)

indexThe Bayesian priors regularize the model. In the previous examples we used a set of uninformative priors. Say we had, from the past literature, a strong belief that the proportions are equal and very close to 0.5. We can express that belief with e.g. Beta(500, 500).

...
p1 = Beta("p1", alpha=500, beta=500)
y1 = Binomial("y1", n=n1, p=p1, value=data1, observed=True)

p2 = Beta("p2", alpha=500, beta=500)
y2 = Binomial("y2", n=n2, p=p2, value=data2, observed=True)
...

indexIn this case we can’t make an argument that the proportions are different because of our strong priors. If we would collect more data and feed it to our model, eventually the evidence would win over the prior. Also note that the variance of the deterministic variable diff has drastically been reduced, this is another side effect due to the use of strong priors.

It’s also interesting to have a look at the posterior of p2. We started with a prior with a peak at 0.5. As one can see in the graph below, the evidence changed only slightly our belief from 0.5 to ~0.54:

index

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s