Machine Learning: How to pull Google Sheets data into Colabs


If you are using Google Colabs it would make sense to want to pull in a CSV file for a machine learning project. It would further make sense that you would have some of this data in a Google Sheet. 

Since I started working on my Ohio Crime Data project, I started with inputting my data into a Google Sheet for the cleanup project. Once that was done, I hoped, that there was a way to pull that directly into a Google Colab in order to start running my machine learning project. The good news, there is a really easy way to do this. I have the code that you can take as your own and get running within seconds!

Depending on what version of Python you are running you’ll want to use pip to install Google’s connector. I’m running Python 3, so I used:

!pip3 install --upgrade -q gspread

If you are running Python 2 you’d use:

!pip install --upgrade -q gspread

Next you need to authenticate your Google Drive to allow access via Colabs with the following code:

from google.colab import auth
import gspread
from oauth2client.client import GoogleCredentials
gc = gspread.authorize(GoogleCredentials.get_application_default())

Within Colabs you will be given a link, click on that, and login and authenticate which will give you a very long code. You must post that code into your Colabs document and hit enter. 

Next, pull the data from your sheet with the following code, just change the information between the quotes that says ‘change spreadsheet name’ to the actual name of the file. Just double-click within the name of the spreadsheet that you are in and copy that name and paste it.

worksheet ='change spreadsheet name').sheet1
rows = worksheet.get_all_values()
import pandas as pd

And just like magic, you should see the data pulled into a dataframe for you to access. If you want to see the Colab file to copy my code just click here