We are going to show you how to fillna using pandas in Python. No dataset is going to come perfect and ready to go. There may be issues such as bad data or missing fields. Often you will find NAN files in your dataset in Python. With pandas you can fill those in with the fillna function.
What do you do? Well, in Pandas you can use the fillna function. But how do you use it and what do you fill it in with? Well, you don’t want to fill it in with a zero for example. Why? Because it’ll destroy the true statistics of your data. Imagine you have 100 entrees and 25 are NAN. If you made those zero can you imagine what happens to your mean? It’d look like 25% of your audience hasn’t been born yet and the mean would probably skew very young.
The fix is to fill in the NAN with the mean. That will help keep your mean the same and essentially make those data points a wash.
Let’s look at an example with Titanic data and how to fillna in Pandas.
As you can see in cabin there are many NaN data.
The simplest way to fill NaN data is with zeros.
Which results in:
Full code to fillna with zeros in pandas:
The best solution would be to fill numeric NaN with a mean so that you aren’t filling data with zeros. That would ruin your data.
Initially, age has 177 empty age data points. Instead of filling age with empty or zero data, which would clearly mean that they weren’t born yet, we will run the mean ages.
Run your code to test your fillna data in Pandas to see if it has managed to clean up your data.
Full code to fillna in pandas with the mean: