Bayesian Statistics 101: BANOVA

In this part we are going to explore the Bayesian equivalent of ANOVA, i.e. BANOVA. ANOVA is a statistical test used to ascertain if the means of several groups are equal, and therefore generalizes the t-test to more than two groups. In fact, when there are only two groups to compare, the t-statistics is equivalent to the square of the F-statistic, where the latter is ratio of the variance between group means over the variance within the groups.

For simplicity, we will explore the Bayesian counterpart of ANOVA using just two groups but it’s important to realize that the method can be generalized to an arbitrary number of them.

Let’s formulate the problem. What we are trying to do here is to predict a dependent numerical variable given an independent nominal one with 2 levels (groups). If the levels of the independent variable truly influence the dependent one, then we expect one or more means of the groups to differ significantly from each other. In other words, it’s just another form of linear regression.

Let’s rephrase the model mathematically by dummy coding the nominal predictor and using the mean value of the predicted variable as baseline:

\mu_i = \beta_0 + \sum\limits_{j} \beta_j x_{ji} (1)

with \sum\limits_{j=1} \beta_j = 0 (2)

\beta_0 is the mean value of the predicted variable while \beta_j is the quantity by which the mean of the predicted variable changes for the level j . Equation (2) is needed to constrain the baseline so that the sum of all deviations from the mean is zero. A careful reader might have noticed that the way we defined the dummy variables, i.e. using the sum-to-zero constraint (STZ), is different from the one commonly seen in the wild known as corner constraints (CR). In (CR) \beta_o represents the mean of the reference group while in (STZ) it represents the overall mean of the effect in all groups.

Let’s start by simulating some data about 2 hypothetical height distributions. Simulating a real phenomena comes in handy to verify that the model does actually what it’s supposed to do.

sd = 5.3
n = 100
numpy.random.seed(42)
grp1 = numpy.random.normal(177.5, sd, n)
grp2 = numpy.random.normal(181.8, sd, n)
data = numpy.concatenate((grp1, grp2))

origin_mean = data.mean()
origin_std = data.std(ddof=1)
data = (data - origin_mean)/origin_std # standardize
grp = n*[0] + n*[1]

 

As you can see we are making the assumption that both groups have the same variance. That’s not always the case and the beauty of probabilistic programming is that models
can easily be adapted to the process that best describes the data. Standardizing the data helps with specifying general, unit independent, priors:

alpha_sd = pymc.distributions.Gamma("alpha_sd", alpha=1.01005, beta=0.1005) # mode=0.1, sd=10.0

alpha0 = pymc.distributions.Normal("alpha0", mu=0, tau=0.001) # y values are assumed to be standardized
alpha = pymc.distributions.Normal("alpha", mu=0, tau=1/alpha_sd**2, size=len(numpy.bincount(grp)))

sigma = pymc.distributions.Uniform("sigma", 0, 10) # y values are assumed to be standardized
likelihood = pymc.distributions.Normal("likelihood", mu=alpha0 + alpha[grp], tau=1/sigma**2, value=data, observed=True)

model = pymc.Model([alpha0, alpha, sigma, alpha_sd])

map_ = pymc.MAP(model)
map_.fit()

thin = 750
burn = 10000
steps = 5000*thin + burn

mcmc = pymc.MCMC(model)
mcmc.sample(iter=steps, burn=burn, thin=thin)

pymc.Matplot.plot(mcmc)

Note that are coefficients are prefixed with alpha instead of beta and that Equation (2) is nowhere to be found.  The reason for the former follows from the latter. To get the constrained beta values we have to mutate the alpha values according to the constraints of Equation (2) and finally unstandardize the coefficients as shown in the snippet below.

Also, notice that the variance, or precision, of the alpha coefficients depend on the same hyperprior (i.e. prior of a prior) alpha_sd (line 4). The hyperprior acts as regularizer to keep estimates from overfitting the data. The regularizer comes into play when many coefficients have to be estimated; if many groups have a small deflection from the overall mean, then the variance estimated by the hyperprior will be small which in turn will shrink the estimates of the coefficients.

Finally, notice that we had to use extensive thinning to battle the effects of autocorrelation.

alpha_samples = mcmc.trace("alpha")[:]
alpha0 = mcmc.trace("alpha0")[:]
alpha1 = alpha_samples[:, 0]
alpha2 = alpha_samples[:, 1]

# Convert to zero-centered b-values
m1 = alpha0 + alpha1
m2 = alpha0 + alpha2
merged = numpy.vstack((m1, m2)).T

b0 = [x.mean() for x in merged]
b1 = m1 - b0
b2 = m2 - b0

# Unstandardize
b0 = b0*origin_std + origin_mean
b1 = b1*origin_std
b2 = b2*origin_std

Once we have the estimates of our coefficients, we can easily perform contrasts between them without worrying about multiple comparison corrections as in ANOVA.

# Constrast between b1 and b2
plt.hist(b2 - b1, 50, normed=1, facecolor='g', alpha=0.75)
hpd = pymc.utils.hpd(b2 - b1, 1. - 0.95)
plt.axvline(x=hpd[0], linewidth=2, color='r')
plt.axvline(x=hpd[1], linewidth=2, color='r')

indexAs the plot shows, we are confident that the two groups differ and the result does indeed reflect the simulated data.

We are just barely scratching the surface here. Some great books to learn more about BANOVA and Bayesian statistics are Bayesian Data Analysis and Doing Bayesian Data Analysis: A Tutorial with R and BUGS .

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s