How to fail and still succeed – a little data science fun


This past week has been brutal on my sinuses. And being that I’m the type who has to find some science in nearly everything. We’ve had pretty epic thunderstorms and rain roll-through Northeast Ohio and every time one got close my head turned into an epic congested cluster. I decided to try to figure out if the weather had an impact on my sinuses.

After some research, I learned that there was a Japanese study that found a drop in barometric pressure resulted in migraines. I didn’t have access to medical data, however, Google does provide search data and trends.

Google Trends gave me access to data for my area and the ranking of searches by day from Jan 1, 2019 to the present. Their scale is from 0-100 which shows what is trending on a particular day. I ran this data against weather data during the same timeframe: temperatures, humidity, dew point, wind, precipitation, and barometric pressure.

Within those weather data points were the highs, lows, and difference within that day. One key area would be the drop in barometric pressure.

What did my models show? Well, as soon as I ran my confusion matrix, I knew this was going to be a fail. Nothing showed a correlation with the Google search trends. The best number I got was a minimum temperature and even that was close to zero. All the rest were negative numbers and that shows nothing good.

Yes, my confusion matrix is very confused.

Just to please the little data scientist inside me, I ran a linear regression RMSE and R Squared. Yikes.

Linear Regression R squared”: -0.3083

Linear Regression RMSE: 17.7130

And just for more salt on the wound, random forest was also an epic fail. Random Forest R squared”: -0.0268

If you want to define bad, well just look at those numbers. No correlation. So, how do you explain this? Well, I tested again, searches for sinus infection, sinus headache, and many more with worse and worse results. My conclusion, after talking it out with friends, was that people may already know how to cure their headaches. Chances are they aren’t new to them. There are probably a bunch of reasons why, but clearly searches aren’t correlating with the weather of any kind. If you want to see my notebook to have a look at Kaggle.

One other fun note, I learned how to query Google BigQuery and pull tons of public datasets into notebooks. One was all the historical NOAA weather data from 1900. That’s pretty cool. Sometimes failing can be fun and a great lesson.


Please enter your comment!
Please enter your name here