Feature Engineering: LabelEncoder sklearn example

October 25, 2019

A big part of machine learning is data cleanup and feature engineering. A good tool in your data science toolkit will be label encoding. Text fields can be valuable for natural language processing, but not when you want to run something numeric against them. For example, what if your answers in the data are “yes”, “no” and “maybe”? You may only care about a yes or no? Ideally, you’d want to change these to a 1 or 0.

Often categorical variables need to be converted into integers so our models can use the data. For this, we can use sklearn’s LabelEncoder. This assigns an integer to each value of the categorical feature and replaces those values with the integers.

The documentation for sklearn is here, but frankly if you are new to machine learning it’s not very helpful as a way to learn.

We are going to convert category, currency and country in the following data example:

For the categorical variables category, currency, and country let’s convert them into integers so our model can use the data. For this, I’ll use sklearn’s LabelEncoder. This assigns an integer to each value of the categorical feature and replaces those values with the integers.

[code lang=”python”]
from sklearn.preprocessing import LabelEncoder

cat_features = [‘category’, ‘currency’, ‘country’]
encoder = LabelEncoder()

# Apply the label encoder to each column
encoded = ks[cat_features].apply(encoder.fit_transform)
encoded.head(10)
[/code]

And your head will be below:

What you will notice is an integer for each item. Clearly there are many categories since the numbers range from 39 to 123. The goal is to be able to look at categories that we can predict to be successful. Similarly with country or currency–might even find interesting wins if the country is Germany but use the American Dollar or a Euro. This also allows us to easily visualize the data!

Feature Engineering: LabelEncoder sklearn example

LEAVE A REPLY

EDITOR PICKS

Use Google Colab and Kaggle Data with bonus: fastai2

What is the Python sorted function? An example of how to...

How to create Python class variables

POPULAR POSTS

What is Wifi Assist and why you want to turn it...

How to learn R programming

So you got a monopoly huh? Guess again

POPULAR CATEGORY

Data Wrangling Part 2: Cleaning up Ohio Crime Data for Machine...

Analyzing NFL Concussion data for Kaggle Data Science Competition

What is the difference between data science vs data analytics?