Data Wrangling: Cleaning up Ohio Crime Data for Machine Learning

By Charlie September 12, 2018

1 Min Read

Often it seems like the biggest part of machine learning is actually acquiring and cleaning up data. The state of Ohio provides crime data in CSV format however the data cannot be used out of the box. I’m sure it is useful for someone but not for running predictions or even BI tools in its current state. So, cleaning the data and formatting it into a way that is useable is a daunting task.

Below is an example of the original data (I clipped off the other crimes as that is not important to show the cleanup and changes required). First, the data is in separate files by year. You could run those files and pull all the data in and do a join but a full-scale cleanup is better for the long-run.

Initial data for 2016:

The cleaned up version removes empty lines, totals, and general housekeeping. The added columns are: town, year and county.

In the end the changes weren’t monumental but they were time consuming to do five years of data cleanup manually but worth the work. Next I’ll start showing some predictions based off the cleaned up data.

Categorized in:

Tech & AI,

Last Update: February 20, 2026

Comments

Data Wrangling Part 2: Cleaning up Ohio Crime Data for Machine Learning - Crained on September 23, 2018

[…] In a previous post, I discuss cleaning public Ohio crime data. As I start to get deeper into the data, and go through years 2016-2009, many new issues come to light. It is also very good cleaning up because you also start to think of ideas as well. […]
Machine Learning: How to pull Google Sheets data into Colabs - Crained on October 8, 2018

[…] I started working on my Ohio Crime Data project, I started with inputting my data into a Google Sheet for the cleanup project. Once that was done, […]

Press ESC to close

Share this:

Related Articles

Bambu Lab P1S Review: I Bought One — Here’s the Honest Truth

Positive Grid Spark NEO Review: I Rocked My Face Off and My Dog Never Flinched

Speediance Gym Monster 2 Review: I Spent $3,484 to Never Wait for a Machine Again

Analyzing Premier League Predictions: How Accurate Were We?

Comments

Leave a Reply