Predicting US Presidential candidate winner using Machine Learning
This article discusses the Exploratory Data Analysis(EDA) and Supervised Learning of US Presidential Elections 2020.

As we all know that the battle for the 2020 USA Presidential elections has begun. So this year the current president “Donald Trump” of the Republican party is facing a challenge from the Democratic candidate “Joe Biden”.So as the US Elections are currently in the go, we will do some interesting sentiment analysis where we will first extract all the tweets from Twitter which are either associated with Trump on Biden, and based on the same data we will analyze who people are favoring the most. So let’s get started…
The analysis we do will be based on the analysis of the people tweets and our outcome will also be based on the same. So the first step for us will be to get the dataset to store it into a CSV file and perform further analysis. Before moving on with our analysis let’s first take a look at the 2016 elections and let’s see what were the results back then.
Source: statista.com
As we can see from the above image Donald Trump won the election getting most of the votes from the mid and south-eastern parts of The United States. Well, it’s clear from the above picture that Donald Trump won through a very large margin. But one interesting thing here is if we take a look at the 2016 forecast, it was predicted by many US news media outlets that Hillary Clinton is gonna win by a large margin. See the forecast from The NY Times below:-
Source: nytimes.com
Isn’t it shocking that the media market was predicting Clinton to win over Trump and The NY Times even predicted that Clinton has “85% chances of winning” over Trump, but in reality, it was quite opposite as Trump won by a large number?
Well that the 2016 forecast prediction, let’s see the forecast for 2020 Elections.
Source: statmodeling.stat.columbia.edu
Well, the stats are clearly Biden sides this year and as per this forecast, Biden is leading Donald Trump to win the elections. But as we saw in the last election we can’t entirely trust this figure, can’t we?
So we will now do our own analysis using extracted tweets to predict who has a high probability of winning the elections? So let’s see this step by step.
Step 1:- Importing the libraries & loading the dataset:-
Now let’s load our dataset:-
As you can see we have two datasets here, the first dataset comprises tweets relating to Trump and the second dataset comprises tweets relating to Biden. So let’s explore both these datasets in detail.
Let’s first take a look at the dataset comprising of trumps tweets.
Now let’s take a look at the dataset comprising of Biden related tweets.
Now based on the textual information here we will create a new column named “polarity” which will give us the percentage of positive/negative tweets from our data. Let’s see the code for it:-
Now based on the polarity dataset we will create a new column “Expression” where we will determine the type of tweet based on the polarity. We’ll understand this better with the help of the code. So let’s check it out.
Now we will do some seaborn analysis to understand the data with the help of seaborn. First we’ll see the plot for biden dataset.
Now let’s take a look at the polarity plot related to trump dataset.
If you take a closer look you’ll see that in Trump’s dataset the line graph is a bit high on the negative side,i.e side less than zero, this is an indication that there are more negative tweets in Trumps dataset than are in Biden’s.
Now let’s take a look at the bar graph which will show us the number of positive and negative tweets for both the presidential candidates. So let’s see the code for it first:-
As we can see according to our analysis Joe Biden has received more positive tweets and less negative tweets compared to Trump. So it’s clear from the analysis that the online American voter supports Biden over Trump.
Now let’s do a final analysis where we will try to plot the public opinion and see the stats of whom the online voter is supporting more. So let’s see the code for it first.
You can see from the code that we have taken only the positive tweets related to both the candidates and not we’ll see the output bar graph by which we’ll know the number of supporters for both the candidates. So let’s see the output:-
It’s clear from the above bar graph that based on all of the Twitter data that we have, Biden has a larger amount of positive tweets than Trump, which clearly means that the public opinion is in favor of Biden and Biden has high chances of winning the election according to our analysis.
So that’s it for today if you like the blog do hit the like button and stay tuned for more blogs related to Data Science and AI.