NBA drafts Big Data


Harvard researchers have used Odyssey to dig deep into NBA player data, creating a new statistical framework for basketball analytics. The research, led by Kirk Goldsberry, Visiting Scholar at the Center for Geographic Analysis, Luke Bornn, Assistant Professor in the Department of Statistics, Dan Cervone, and Alex D’Amour both PhD students in the Department of Statistics, uses player data from the 2012-2013 NBA season. The dataset, known as SportVU, was collected at 14 NBA arenas and contains 800 million locations of NBA players on the court.

To make sense of this data, Cervone and D’Amour proposed the theory of assigning a value to each basketball possession. If all possessions could be valued, a model could be designed using the SportVU data with metrics such as the locations of players, player scoring abilities, player ball possession, player court position, and player ball handling. Running this type of statistical model would provide analysts with a scientific assessment of “expected possession value” or EPV. Player performance could be statistically quantified at any point in the game. Coaches could use this information to adopt specific strategies for specific players at specific times.

With the model in place, researchers turned to Research Computing’s Odyssey cluster for computation. The database researchers built totaled 93 gigabytes. A full analysis of this database required 500 parallel processors and two terabytes of memory. Without the computational power of Odyssey, the analysis of such a large dataset would have been impossible outside of the cluster environment.

The results from the computational run were what most NBA fans would expect. Chris Paul, point guard for the Los Angeles Clippers, had the highest EPV with 3.48 points added per game. According to the researchers, this meant the Clippers were expected to score 3.48 more points per game because Paul controlled the ball on offense. Ricky Rubio, point guard for the Minnesota Timberwolves, had the lowest EPV with -3.33 points “added” per game. Because Rubio is a poor shooter, each time he takes a shot it would be statistically preferable if a teammate took the shot instead. While Rubio’s ball handling skills do add value, his overall EPV is reduced because of shooting weakness.

As datasets grow in size, complexity, and importance, the NBA will not be the only organization looking to high performance computing as a way to measure and model value. What the Harvard researchers essentially revealed is with the right model and numerous useful data points, anything can be scientifically quantified and potentially transform our understanding of the world around us.


The academic paper is titled “POINTWISE: Predicting Points and Valuing Decisions in Real Time with NBA Optical Tracking Data,” and can be found here.

The article “DataBall” by Kirk Goldsberry, which the above draws from, can be found at Grantland.

Copyright © 2014. All Rights Reserved.
Information about how to reuse or republish this work may be available at Attribution.