How I got a Perfect Score on a Kaggle NLP with Disaster Tweets Competition

January 19, 2020

I love Kaggle. I love the competition and testing my skills against brilliant data scientists from around the world. Today I decided to get back in it since the weather is prohibiting me to do anything else–its snowing and 15 degrees right now.

So, I decide to enter the Real or Not? NLP with Disaster Tweets and within an hour got a perfect score.

The main purpose of the competition is Twitter has become an important communication channel in times of emergency. The ubiquitousness of smartphones enables people to announce an emergency they’re observing in real-time. Because of this, more agencies are interested in programmatically monitoring Twitter (i.e. disaster relief organizations and news agencies).

The problem is that Kaggle is using a dataset that was already released with full test data and labels. What does that mean?
Well here is Kaggle’s dataset:
https://www.kaggle.com/c/nlp-getting-started/data
Here is a full labeled dataset:
https://www.kaggle.com/szelee/disasters-on-social-media

All I needed to do was connect the public dataset about disasters that were relevant or not relevant and match those with the Kaggle test set which is missing the 0 (not a disaster) or a 1 (is a disaster) targets.

By converting relevant to equal target of 1 you will know that it is a disaster. Then you just need to match the id in both sets and guess what, you know what is or isn’t a disaster.

The reality is this is hacking to win vs. actually doing the work. The issue is that Kaggle left a giant data label leak in their competition. Frankly, they should close the leak and use some new data.

How I got a Perfect Score on a Kaggle NLP with Disaster Tweets Competition

LEAVE A REPLY

EDITOR PICKS

Use Google Colab and Kaggle Data with bonus: fastai2

What is the Python sorted function? An example of how to...

How to create Python class variables

POPULAR POSTS

What is Wifi Assist and why you want to turn it...

How to learn R programming

So you got a monopoly huh? Guess again

POPULAR CATEGORY

Set up the best parameters for Deep Learning RNN with Grid...

Feature Engineering: LabelEncoder sklearn example

How to fail and still succeed – a little data science...