On functional data analysis

Functional data analysis (FDA) is a collection of methods applied to a dataset consisting of a collection of curves, continuous functions, or at least observations of such curves at discrete points. It includes different topics of statistics such as supervised and unsupervised classification, factor analysis, inference, regression, and more. FDA is particularly interesting since the methods can incorporate information on the rates of change or derivatives of the curves, which can be extremely useful when modelling and analysing results from physical phenomena.  Hence, there has been a recent increase in popularity of these methods within a large number of fields including bioscience, system engineering and meteorology, with the two main references being monographs by Ramsay and Silverman [1], and Ferraty and Vieu [2].

Datasets in the functional data literature

In the FDA literature, there are many publicly available datasets. Canadian weather, the poblenou data and the tecator dataset are among the most popular ones. These functional datasets stem from real phenomena, and are extensively useful for the nonparametric methods for functional data analysis developed by Ferraty and Vieu [3]. For example, the Canadian weather dataset consists of the daily temperature and precipitation at 35 different locations in Canada, whereas the poblenou data groups the NOx levels measured every hour by a control station in Barcelona. The tecator dataset, which appeared in the paper by Borggaard and Thodberg [4], consists of a collection of 215 finely chopped meat pieces with different moisture, fat and protein contents. We observe one spectrometric curve which corresponds to the absorbance measured at 100 wavelengths. The pieces are split into two different classes: with small (<20%) and large fat content obtained by an analytical chemical processing.  The spectrometric curves are shown in Figure 1.

spectometric curvesFigure 1:  Spectrometric curves for the tecator dataset.

With the huge advances in technology and the constant production of data, more sophisticated structures will be produced in the future. Therefore, constructing statistical methods for such data allows us to anticipate new kinds of datasets as well as the technologies that will produce them.

Challenges for the future

There are many open problems in the field of functional data, for instance, Bayesian approaches, spatial functional statistics and differential equation models as suggested by Ramsay, Hooker, and Graves [5]. In addition, there is a need to develop suitable mathematical models to explore nonlinear structures in high dimensional spaces, and the challenge of determining the dimension for principal component representation in infinite dimensional spaces to define a density for functional data. Due to the large class of problems in which FDA can already be applied, advancements in this area are likely to have high impact on applied sciences such as environmetrics, chemometrics and econometrics, among many others. 

 

References

[1] Ramsay, James O., and Silverman, Bernard W. (2006), Functional Data Analysis, 2nd ed., Springer, New York.

[2] Ferraty, F.  and  Vieu, P.  (2006). Nonparametric Functional Data Analysis: Theory and Practice. Springer Series in Statistics, Springer-Verlag, New York.

[3] Ferraty, F. and Vieu, P. (2003). Curves Discrimination: A Nonparametric Functional Approach. Computational Statistics and Data Analysis, 44, 161-173.

[4] Borggaard, C. and Thodberg, H.H. (1992).  Optimal minimal neural interpretation of spectra. Anal. Chem. 64, 545-551.

[5] Ramsay, James O., Hooker, G., and Graves, S. (2009), Functional Data Analysis in R and Matlab, Springer, New York.

Author: Diego Andres Perez Ruiz, diego.perezruiz@manchester.ac.uk