Linear Algebra in Sports Analytics

Recently, due to the large amount of data available and the “Big data” buzz, there has been a surge of activity in applying maths to analyse sport. In addition to keeping tabs of the number of points scored and by whom, companies now collect player tracking data which can locate players to within a few centimetres multiple times per second! All of this new data opens many possibilities for interesting ways to analyse performance.

football_ground_201516_compressed.png
Sports analytics is growing fast in football.

Defensive Responsibility in Basketball

Measuring the defensive capability of players is a difficult problem. There are plenty of ways to assess offensive capability but, since defenders stop points being scored, it is harder to measure their effect on the game. With access to player tracking data from the NBA, Kirk Goldsberry has designed a model to see which defender is responsible for each attacker at any given time so that their defensive ability can be measured against the league average. Some more information on this work can be found at the following links.

The main use of linear algebra here is in fitting a linear mixed model to an extremely large dataset, which means solving a large least squares problem. This document describes how the lme4 package in R fits linear mixed models using a Cholesky decomposition of the normal equations which squares the condition number! Perhaps they should consider using the QR method instead.

Expected Goals in Football

In a low scoring game such as football (a couple of goals per game) it often makes sense to talk about how many goals a team “should” have scored based upon the quality of the shots they generated. Access to player tracking data has opened up a host of new ways in which we can calculate this quantity: One possibility is to extract features of play from the data and feed them into a machine learning framework. Such features might include the distance from the goal, the number of attackers running forwards, or the current score (losing teams might try harder).

Here are two articles on the subject.

The mathematics involved depends upon the method used but most of them involve solving a (possibly nonlinear) optimization problem which could be solved via the (nonlinear) conjugate gradient method or stochastic gradient descent. The key to solving all these problems in a reasonable amount of time is to use extremely efficient linear algebra.

Advertisements