# Confidence intervals and hypothesis tests for engineers

I wrote a small parallel library for python that implements the permutation test and the bias corrected version of the bootstrap, which gives non-statisticians the ability to exploit confidence intervals and hypothesis tests for arbitrary statistics at the expense of some CPU cycles.

Modern hardware allows us to understand and compute statistics in ways that were not possible when the field was born. Resampling methods allow to quantify uncertainty with fewer assumptions and greater accuracy, at a higher computational cost, a paradigm shift in the mindset of modern statistics.

Confidence intervals are based on the idea of the sampling distribution of a statistics, that is the distribution of values of the statistic over all possible samples of the same size. Given such sampling distribution, it’s easy to build a confidence interval. As we usually have a single sample, statisticians devised formulas to compute a confidence interval assuming that the sampling distribution has a certain well-known shape.

As a concrete example, the sampling distribution of the sample mean has a normal distribution which follows from the central limit theorem. If you are looking for the sampling distribution for say, the trimmed mean or the median, things are considerably harder. Exotic formula do exist in some cases but the bootstrap provides a computational way of approximating the sampling distribution without any assumption on its shape and spread.

The bootstrap of a statistic draws thousands of resamples with replacement from the original sample and computes the distribution of the statistic of those samples. This distribution approximates the shape, spread and bias of the real sampling distribution but is centered at the statistic of the of sample in the best case and can be affected by a considerable bias in the worst case. There are techniques though to remove the bias from the bootstrap distribution. If the sample is a good approximation of the population, the bootstrap method will provide a good approximation of the sampling distribution. As a rule of thumb you should have at least 50 independent data points before applying the method with at least 1000 bootstrap samples. Also, trying to apply the bootstrap on some very weird statistics that depend on few values of the sample, like the maximum, is a recipe for disaster.

I wrote in the past about the permutation test and how I used it to implement a hypothesis test for Telemetry histograms, so I am not going to reiterate its core ideas here. What’s important to understand though is that it assumes that the observations are exchangeable under the null hypothesis. This implies that the observations viewed individually must be identically distributed.