On functional data analysis

Functional data analysis (FDA) is a collection of methods applied to a dataset consisting of a collection of curves, continuous functions, or at least observations of such curves at discrete points. It includes different topics of statistics such as supervised and unsupervised classification, factor analysis, inference, regression, and more. FDA is particularly interesting since the methods can incorporate information on the rates of change or derivatives of the curves, which can be extremely useful when modelling and analysing results from physical phenomena.  Hence, there has been a recent increase in popularity of these methods within a large number of fields including bioscience, system engineering and meteorology, with the two main references being monographs by Ramsay and Silverman [1], and Ferraty and Vieu [2].

Datasets in the functional data literature

In the FDA literature, there are many publicly available datasets. Canadian weather, the poblenou data and the tecator dataset are among the most popular ones. These functional datasets stem from real phenomena, and are extensively useful for the nonparametric methods for functional data analysis developed by Ferraty and Vieu [3]. For example, the Canadian weather dataset consists of the daily temperature and precipitation at 35 different locations in Canada, whereas the poblenou data groups the NOx levels measured every hour by a control station in Barcelona. The tecator dataset, which appeared in the paper by Borggaard and Thodberg [4], consists of a collection of 215 finely chopped meat pieces with different moisture, fat and protein contents. We observe one spectrometric curve which corresponds to the absorbance measured at 100 wavelengths. The pieces are split into two different classes: with small (<20%) and large fat content obtained by an analytical chemical processing.  The spectrometric curves are shown in Figure 1.

Figure 1:  Spectrometric curves for the tecator dataset.

With the huge advances in technology and the constant production of data, more sophisticated structures will be produced in the future. Therefore, constructing statistical methods for such data allows us to anticipate new kinds of datasets as well as the technologies that will produce them.

Challenges for the future

There are many open problems in the field of functional data, for instance, Bayesian approaches, spatial functional statistics and differential equation models as suggested by Ramsay, Hooker, and Graves [5]. In addition, there is a need to develop suitable mathematical models to explore nonlinear structures in high dimensional spaces, and the challenge of determining the dimension for principal component representation in infinite dimensional spaces to define a density for functional data. Due to the large class of problems in which FDA can already be applied, advancements in this area are likely to have high impact on applied sciences such as environmetrics, chemometrics and econometrics, among many others.

References

[1] Ramsay, James O., and Silverman, Bernard W. (2006), Functional Data Analysis, 2nd ed., Springer, New York.

[2] Ferraty, F.  and  Vieu, P.  (2006). Nonparametric Functional Data Analysis: Theory and Practice. Springer Series in Statistics, Springer-Verlag, New York.

[3] Ferraty, F. and Vieu, P. (2003). Curves Discrimination: A Nonparametric Functional Approach. Computational Statistics and Data Analysis, 44, 161-173.

[4] Borggaard, C. and Thodberg, H.H. (1992).  Optimal minimal neural interpretation of spectra. Anal. Chem. 64, 545-551.

[5] Ramsay, James O., Hooker, G., and Graves, S. (2009), Functional Data Analysis in R and Matlab, Springer, New York.

Author: Diego Andres Perez Ruiz, diego.perezruiz@manchester.ac.uk

Bayesian Modelling and Markov chain Monte Carlo

Statisticians and applied mathematicians are interested in building models of real world processes from experimental data. Quite often the experiments which lead to this data are both expensive and time consuming. Fortunately the Bayesian method [4] provides an intuitive way for us to fill the gaps left by small or incomplete data sets. One such example where the Bayesian method might be useful is in the design of car engines.

Example

Suppose a car engine manufacturer has designed a new engine and would like to know the lifetime of an already used drive belt on this new engine. To determine the duration of the belt the car must be run 24/7 for many weeks until the belt gives up: a highly time consuming experiment. Suppose the company is willing to perform at most five repeats to reduce the delay in production and limit expenses. The sample of results is prohibitively small, and we cannot expect to reach accurate, informative conclusions using a classical approach.

Assume the company already has some prior knowledge of the lifetime of the belt. The Bayesian method makes use of this “prior information” and combines it with the experimental results to create a Bayesian predictive distribution.

Figure 1: Probability distributions predicting the lifetime of a drive belt in a new car engine.

We can see in Figure 1 that the Bayesian predictive distribution is somewhere between the prior and the classical estimate. By balancing the classical estimate with the prior knowledge we have obtained a more peaked distribution. This distribution has a smaller variance, suggesting we can have more confidence in the prediction of the expected lifetime of the drive belt.

To calculate the Bayesian predictive distribution, $\pi(x|D)$, given some data, $D,$ we simply multiply the density function of the classical solution, $\ell(D|x)$, with the density function produced by our prior knowledge, $\pi(x)$. This is a direct application of Bayes’ theorem [6]. Unfortunately, this product will not integrate to one, a necessary condition for probability density functions. To overcome this, we multiply the density function by a constant $Z$, which rescales the density so that it does integrate to one. The resulting Bayesian distribution defined over the $n$-dimensional parameter space $S \subset \mathbb{R}^n$ is

$\pi(x|D) = \frac{1}{Z}\pi(x)\ell(D|x), \quad Z = \int_{S} \! \pi(x)\ell(D|x) \, \text{d}x.$

In one dimension it is easy to use numerical quadrature to calculate $Z$. However as the dimension becomes large ($10^9$ variables), this method quickly becomes impractical. Here we turn to a class of statistical algorithms known as Markov chain Monte Carlo (MCMC) methods [1], which can tackle these high dimensional parameter spaces.

Monte Carlo methods are algorithms which produce samples from probability distributions. These samples can be used to estimate statistics such as the mean and variance. MCMC is a Monte Carlo algorithm class which cleverly targets sample locations using Markov chains to achieve faster convergence [3,5].

Such methods are notoriously computationally intensive. It is surprising therefore that MCMC is thought to be one of the most widely used classes of algorithms in computational science, with applications common in computational physics, statistics, biology, computer science and even linguistics. An interesting discussion on further applications of MCMC is available at [2].

[1] S. Brooks, A. Gelman, G. Jones, and X. Meng. Handbook of Markov Chain Monte Carlo. CRC press, 2011.

[2] P. Diaconis. The Markov chain Monte Carlo Revolution. Bulletin of the American Mathematical Society, 2009.

[3] W. Hastings. Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 1970.

[4] P. Lee. Bayesian statistics: an introduction. John Wiley & Sons, 2012.

[5] N. Metropolis, A. Rosenbluth, M. Rosenbluth, A. Teller, and E. Teller. Equation of state calculations by fast computing machines. The journal of chemical physics, 1953.

[6] Wikipedia. Bayes’ theorem | Wikipedia, The Free Encyclopedia, 2016. [Online; accessed 14-April-2016].

Author: Paul Russell, paul.russell@manchester.ac.uk

Public Key Cryptography: Merkle’s Puzzles

Up until the invention of public key cryptography in the 1970s, symmetric ciphers were the only encryption scheme available. A symmetric cipher refers to a cryptographic model where the communicating parties share a secret key to ensure the privacy of communications. Depending on the key, a symmetric encryption algorithm performs various permutations and substitutions to transform a given message into ciphertext. To recover the original message the decryption algorithm would perform the same operations in reverse, using the secret key. However, the key had to be sent through a secure channel, e.g. a trusted courier, registered mail, or best of all – agreed verbally in advance. There did not seem to be any other way for secure information exchange.

In 1974 Ralph Merkle, a computer science student at UC Berkeley attended a course in computer security. As part of the course he submitted a proposal addressing the issue of the distribution of a secret key [1]. The proposal was rejected several times, which led Merkle to drop the course and continue working on his idea independently [2]. Assuming that only an insecure channel is available for transmitting information, Merkle suggested that communications be based on a set of ‘puzzles’, solution to which would contain a unique puzzle ID and a unique puzzle key. If user $B$ wishes to set up a connection with user $A$, they require $A$ to generate a large number, say $N=1,000,000$, of puzzles and transmit them to $B$. After picking at random and solving one of the puzzles, $B$ transmits only the puzzle ID to $A$, and keeps the puzzle key private, which can then be used for subsequent message encryption with a symmetric cipher. Since the puzzle ID is sent through an insecure channel, it is available to an adversary, however, only  $A$ can match the puzzle ID to the puzzle key. The only way for the adversary to obtain the key is to solve every puzzle until a matching ID is found. This would on average require to solve $\frac{1}{2} N$ puzzles.

Figure: First $A$ transmits a large number of puzzles $\text{X}_1, \text{X}_2, \ldots, \text{X}_N$ to $B.$ Next, $B$ randomly selects and solves one puzzle $\text{X}_j$ in order to obtain an $(\text{ID}_j,\text{key}_j)$ pair. Finally, $B$ sends only the puzzle $ID=ID_j$ back to $A.$

A puzzle could be a symmetric encryption of the puzzle key and ID, solvable by brute-force, i.e. by trying all possible keys. If one puzzle takes 2 minutes to solve, with $1,000,000$ puzzles it would take an adversary an average of around a year to obtain the secret key. Merkle’s proposal was a new paradigm of which the puzzles were just an example. Indeed, soon after the proposal, cryptographic systems based on hard mathematical problems such as discrete logarithm [3] and integer factorization [4] were developed. The model is now known as public key cryptography, and it revolutionised the field by eliminating the need for a secure channel to transmit the key.

References

[1] Merkle, R. C. Secure communications over insecure channels. Communications of the ACM 21.4 (1978): 294-299.

[2] Diffie, W. The first ten years of public-key cryptography. Proceedings of the IEEE 76.5 (1988): 560-577.

[3] Diffie, W. and Hellman, M. E. New directions in cryptography. Information Theory, IEEE Transactions on 22.6 (1976): 644-654.

[4] Rivest, R. L., Shamir, A. and Adleman, L. A method for obtaining digital signatures and public-key cryptosystems. Communications of the ACM 21.2 (1978): 120-126.

Author: Mante Zemaityte, mante.zemaityte@manchester.ac.uk

On the Leja method…

…or how we replaced the heuristics of a code by a rigorous backward error analysis.
We all know the procedure of developing a numerical method/algorithm:

1. get it running,
2. perform experiments,
3. optimize the algorithm/code.

The emphasis is always on step 1. Unfortunately, step 3 is often not pursued because <<the PhD student left>, <the focus shifted>, <funding ended>,… >. In 2009 there existed a code for computing the action of the matrix exponential based on Leja method, see [3], [4], [5]. However, the code was based on some heuristics and had room for optimization.

The main idea of the Leja method is to compute $\exp(A)v$ without explicitly forming $\exp(A)$ for possibly large (sparse) matrices. The action of the matrix exponential has various applications, e.g. evolution equations (exponential integrators), network analysis, and many other areas. For the computation a sequence of Leja points in an interval is used (see Figure 1 for an example), as this allows for an iterative selection of the degree of interpolation to reduce computation cost.

Figure 1: 15 Leja points in the interval [-2,2].

In order to allow matrices with a large norm one takes advantage of the functional equation of the exponential to introduce scaling steps. In the end the interpolation essentially depends on three parameters, the number of scaling steps, the location/sequence of the interpolation points, and the degree of interpolation. The remaining question is: How do you select these parameters for a specific input?

The code from 2009 had the following heuristics included

% three magic numbers...
maxm = 150; minm = 30;
m = max (2.6 * ceil (gamma), minm);
nsteps = ceil (m / maxm);

These magic numbers from above determined all of the necessary parameters to perform the interpolation. The gamma in the formula describes an ellipse that encloses the field of values of the input and nsteps corresponds to the number of scaling steps. The first attempt to get rid of the heuristics replaced the above selection procedure, see [1]. The selection of the parameters was then based on the interpolation behaviour of a 2-by-2 matrix, with entries reflecting a bound of the extrema of the field of values of $A$. Nevertheless, it still had some heuristics included.

In the preprint [2], the heuristics were finally eliminated. Yet again, the code was redesigned, based on a rigorous backward error analysis. The main idea is to interpret the result of the interpolation as the exact solution of a perturbed problem. The backward error analysis gives rise to a selection procedure of the essential parameters that can minimize the cost of the algorithm.

As a result, the code is simpler and has no longer any restrictions on the input matrix. Nevertheless, there is still room for optimization. Numerical experiments show that the method can have performance losses for nonnormal matrices. We are currently working to fix this glitch but that is for another blog entry.

The code is available at the Numerical Analysis Innsbruck project page.

References

[1] Caliari, M., Kandolf, P., Ostermann, A., Rainer, S., 2014. Comparison of methods for computing the action of the matrix exponential. BIT Numer. Math. 52 (1), 113-128.

[2] Caliari, M., Kandolf, P., Ostermann, A., Rainer, S., 2015. The Leja method revisited: backward error analysis for the matrix exponential. Preprint, arXiv:1506.08665.

[3] Caliari, M., Vianello, M., Bergamaschi, L., 2004. Interpolating discrete advection-diffusion propagators at Leja sequences. J. Comput. Appl. Math. 172 (1), 79-99.

[4] Huisinga, W., Pesce, L., Kosloff, R., Saalfrank, P., 1999. Faber and Newton polynomial integrators for open-system density matrix propagation. J. Chem. Phy., 110, 5538-5547.

[5] Reichel, L., 1990. Newton interpolation at Leja points. BIT Numer. Math. 30 (2), 332-346.

Author: Peter Kandolf, peter.kandolf@manchester.ac.uk

Linear Algebra in Sports Analytics

Recently, due to the large amount of data available and the “Big data” buzz, there has been a surge of activity in applying maths to analyse sport. In addition to keeping tabs of the number of points scored and by whom, companies now collect player tracking data which can locate players to within a few centimetres multiple times per second! All of this new data opens many possibilities for interesting ways to analyse performance.

Sports analytics is growing fast in football.

Measuring the defensive capability of players is a difficult problem. There are plenty of ways to assess offensive capability but, since defenders stop points being scored, it is harder to measure their effect on the game. With access to player tracking data from the NBA, Kirk Goldsberry has designed a model to see which defender is responsible for each attacker at any given time so that their defensive ability can be measured against the league average. Some more information on this work can be found at the following links.

The main use of linear algebra here is in fitting a linear mixed model to an extremely large dataset, which means solving a large least squares problem. This document describes how the lme4 package in R fits linear mixed models using a Cholesky decomposition of the normal equations which squares the condition number! Perhaps they should consider using the QR method instead.

Expected Goals in Football

In a low scoring game such as football (a couple of goals per game) it often makes sense to talk about how many goals a team “should” have scored based upon the quality of the shots they generated. Access to player tracking data has opened up a host of new ways in which we can calculate this quantity: One possibility is to extract features of play from the data and feed them into a machine learning framework. Such features might include the distance from the goal, the number of attackers running forwards, or the current score (losing teams might try harder).

Here are two articles on the subject.

The mathematics involved depends upon the method used but most of them involve solving a (possibly nonlinear) optimization problem which could be solved via the (nonlinear) conjugate gradient method or stochastic gradient descent. The key to solving all these problems in a reasonable amount of time is to use extremely efficient linear algebra.

A Personalizable Test Matrix Collection for Julia

Test matrices are widely used for measuring the performance of an algorithm with respect to accuracy, stability or speed; testing the correctness and robustness of an algorithm on a wide range of problems; and comparing competing algorithms.

Various collections of test matrices have been made available in software, including:

This blog entry is about Matrix Depot, a test matrix collection for the Julia language. Matrix Depot coalesces parameterized test matrices, regularization test problems, and real-life sparse matrix data into a single framework. It not only provides a diverse collection of test matrices but is also extensible and personalizable.

1 Extending the Collection

When Matrix Depot (v0.5.x) is first installed, we have 55 test matrices available to use. Matrices can be easily generated using the interface matrixdepot("name", p1, p2,...), where name is the matrix name and p1, p2,... are input parameters depending on name. For example, the following command will generate a 12-by-12 Wilkinson matrix and then display a 2-D color image of the matrix using PyPlot.

julia> using MatrixDepot
julia> A = matrixdepot("wilkinson", 12)
julia> using PyPlot
julia> matshow(A)

Figure 1: A 12-by-12 Wilkinson matrix.

We can extend the collection by downloading matrices from the University of Florida Sparse Matrix Collection using the interface matrixdepot(name, :get). For example, we can download the matrix web-Google from SNAP (Stanford Network Analysis Platform) by

julia> matrixdepot("SNAP/web-Google", :get)

In addition, we can easily add new matrix generators to Matrix Depot. See http://matrixdepotjl.readthedocs.org/en/latest/user.html for more details.

2 Defining Personalized Groups

Matrix Depot has 10 predefined groups. The function matrixdepot(group) returns a list of matrices in a given group. For example, here a list of random matrices in Matrix Depot.

julia> matrixdepot("random")
8-element Array{ASCIIString,1}:
"golub"
"oscillate"
"randcorr"
"rando"
"randsvd"
"rohess"
"rosser"
"wathen"


Every test matrix in Matrix Depot is assigned a unique number. We can access matrices by number and test an algorithm on a group of matrices in a simple loop.

julia> matrixdepot(2, 4, 22:24)
5-element Array{AbstractString,1}:
"binomial"
"cauchy"
"hilb"
"invhilb"
"invol"

julia> for name in matrixdepot(2, 4, 22:24)
A = matrixdepot(name, 4)
@printf "%9s has 2-norm %0.3f\n" name norm(A)
end

binomial has 2-norm 4.576
cauchy has 2-norm 0.978
hilb has 2-norm 1.500
invhilb has 2-norm 10341.015
invol has 2-norm 313.525

A user can define new groups with the macro @addgroup and use them just as the predefined groups.

julia> @addgroup testforAlg1 = matrixdepot(2, 4, 22:24)


By extending the collection and defining new groups, Matrix Depot allows you to design your personal test collection.

Author: Weijian Zhang, weijian.zhang@manchester.ac.uk

Topological Graph Theory and Fullerenes 1

This entry is the first in a series of two posts on applications of topological graph theory to chemistry.

What is a Fullerene?

The geometry of a (pre-2006) football is familiar to most: it is a truncated icosahedron with 12 pentagonal and 20 hexagonal faces. What is maybe less familiar is that the truncated icosahedron appears naturally as a carbon molecule, the C60 Buckminster-fullerene.

Figure 1. A C60 fullerene (the Buckminster-fullerene). Author: Benjah-bmm27.

A fullerene is a 2-dimensional lattice of carbon atoms folded into a 3-dimensional shape like that of a ball or a cylinder. The cylindrical fullerenes are better known as carbon nanotubes.

We have additional restrictions on the lattice: every atom of carbon has to bond with exactly 3 other atoms, and the rings formed by the carbon atoms can only be pentagonal or hexagonal.

Figure 2. A lattice for a C26 fullerene. This lattice is folded over a sphere in such a way that the “outer” face forms a pentagon.

The C60 Buckminster-fullerene was the first fullerene to be discovered; its discovery is due to Kroto, Heath, O’Brien, Curl and Smalley, in 1985. In 1996, Kroto, Curl and Smalley were awarded the Nobel Prize in chemistry for their study of fullerenes.

Combinatorial Properties of Fullerenes

As mathematicians we probably cannot give much insight into the chemistry of fullerenes, however we can say a lot about their combinatorial properties.

The spherical fullerenes with V atoms, E bonds and F=P+H pentagonal and hexagonal faces satisfy the relation

V-E+F=2,

known as the Euler’s polyhedron formula. This formula already tells us a lot about fullerenes: suppose that a spherical fullerene has P>0 pentagonal and H>0 hexagonal faces. Then the number of atoms must be V=(5P+6H)/3 since we’re counting every atom three times; the number of bonds must be E=(5P+6H)/2 since every bond is counted twice, and the number of faces is clearly F=P+H. Furthermore, the Euler’s polyhedron formula tells us that

2=V-E+F=P/6.

Hence spherical fullerenes must have exactly 12 pentagonal faces!

This result greatly simplifies the enumeration of all spherical fullerenes. Using additional concepts from topological graph theory, Brinkmann and Dress devised a sufficiently fast algorithm for enumerating spherical fullerenes; their results are presented in Table 1.

Note that the C60 fullerene has 1812 chemical isomeres; the one discovered by Kroto et al. is special among those: it is the only isomer in which the pentagonal faces are not adjacent.

Such fullerenes are called IPR-fullerenes, where IPR stands for Isolated Pentagon Rule, and are chemically more stable. The algorithm of Brinkmann and Dress enumerates IPR-fullerenes as well; their results are presented in Table 2.

References

[1] G. Brinkmann, A. Dress. A Constructive Enumeration of Fullerenes. Journal of Algorithms 23, 345-358 (1997).

[2] E.A. Lord, A.L. Mackay, S. Ranganathan. New Geometries for New Materials. Cambridge University Press, Cambridge, 2006.

Author: Goran Malic, goran.malic@manchester.ac.uk