Statisticians and applied mathematicians are interested in building models of real world processes from experimental data. Quite often the experiments which lead to this data are both expensive and time consuming. Fortunately the Bayesian method [4] provides an intuitive way for us to fill the gaps left by small or incomplete data sets. One such example where the Bayesian method might be useful is in the design of car engines.

#### Example

Suppose a car engine manufacturer has designed a new engine and would like to know the lifetime of an already used drive belt on this new engine. To determine the duration of the belt the car must be run 24/7 for many weeks until the belt gives up: a highly time consuming experiment. Suppose the company is willing to perform at most five repeats to reduce the delay in production and limit expenses. The sample of results is prohibitively small, and we cannot expect to reach accurate, informative conclusions using a classical approach.

Assume the company already has some prior knowledge of the lifetime of the belt. The Bayesian method makes use of this “prior information” and combines it with the experimental results to create a *Bayesian predictive distribution*.

Figure 1: Probability distributions predicting the lifetime of a drive belt in a new car engine.

We can see in Figure 1 that the Bayesian predictive distribution is somewhere between the prior and the classical estimate. By balancing the classical estimate with the prior knowledge we have obtained a more peaked distribution. This distribution has a smaller variance, suggesting we can have more confidence in the prediction of the expected lifetime of the drive belt.

To calculate the Bayesian predictive distribution, , given some data, we simply multiply the density function of the classical solution, , with the density function produced by our prior knowledge, . This is a direct application of Bayes’ theorem [6]. Unfortunately, this product will not integrate to one, a necessary condition for probability density functions. To overcome this, we multiply the density function by a constant , which rescales the density so that it does integrate to one. The resulting Bayesian distribution defined over the -dimensional parameter space is

In one dimension it is easy to use numerical quadrature to calculate . However as the dimension becomes large ( variables), this method quickly becomes impractical. Here we turn to a class of statistical algorithms known as Markov chain Monte Carlo (MCMC) methods [1], which can tackle these high dimensional parameter spaces.

Monte Carlo methods are algorithms which produce samples from probability distributions. These samples can be used to estimate statistics such as the mean and variance. MCMC is a Monte Carlo algorithm class which cleverly targets sample locations using Markov chains to achieve faster convergence [3,5].

Such methods are notoriously computationally intensive. It is surprising therefore that MCMC is thought to be one of the most widely used classes of algorithms in computational science, with applications common in computational physics, statistics, biology, computer science and even linguistics. An interesting discussion on further applications of MCMC is available at [2].

[1] S. Brooks, A. Gelman, G. Jones, and X. Meng. Handbook of Markov Chain Monte Carlo. CRC press, 2011.

[2] P. Diaconis. The Markov chain Monte Carlo Revolution. Bulletin of the American Mathematical Society, 2009.

[3] W. Hastings. Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 1970.

[4] P. Lee. Bayesian statistics: an introduction. John Wiley & Sons, 2012.

[5] N. Metropolis, A. Rosenbluth, M. Rosenbluth, A. Teller, and E. Teller. Equation of state calculations by fast computing machines. The journal of chemical physics, 1953.

[6] Wikipedia. Bayes’ theorem | Wikipedia, The Free Encyclopedia, 2016. [Online; accessed 14-April-2016].

Author: Paul Russell, paul.russell@manchester.ac.uk