Analyzing NFL Concussion data for Kaggle Data Science Competition


Recently, I entered the NFL Concussion on punt returns contest for data scientists. It wasn’t the normal machine learning problem. In fact, it is not machine learning at all. It is more true data analysis, which to be honest was much more fun.

For the 2018 season, the NFL revised their kickoff rules in an effort to reduce the risk of injury during those plays. By examining injury reports, player position and velocity data, and game video, they were able to understand the game-play circumstances that may exacerbate the risk of injury to players.

This comprehensive review showed that over the course of all games during the 2015-2017 seasons, the kickoff represented only six percent of plays but 12 percent of concussions. Players had approximately four times the risk of concussion on returned kickoffs compared to running or passing plays. The changes to the kickoff rule aim to address the components that posed the most risk, like the use of a two-man wedge.

Now, the NFL is challenging data scientists to examine and make recommendations for punt play rules. They have provided data for all punt plays from the 2016 and 2017 NFL seasons that includes player rosters, on-field position data, and video data, including the plays in which a player suffered a concussion.

My data analysis showed some, results. I won’t say what my recommendations were until after the contest has closed, but you can see view my notebook with the results of my analysis.

The most interesting part of this is that linebackers and wide receivers commit most of the concussions. I posed many questions in the notebook when there was missing information. Such as, it would be great to know the players’ weight, whether they were full-time players, meaning didn’t come off the bench for punts.

One of the hardest parts, the player data is for only two years and the numbers are small.