Selecting best features using correlation matrix to improve accuracy of your machine learning model

Image by Seksak Kerdkanno from Pixabay

Ever felt lost in an Ice Cream shop while choosing your favourite flavour OR ever imagined the plight of the Football Club owners while choosing the International players.

Well, today I would like to share my experience on a similar situation -

How to select the best features ( Or Flavours ) from many available features for your machine learning model — With only one GOAL : To improve the prediction accuracy.

Which Dataset to be used?

We will use Predict Heart Failure Dataset from Kaggle which can be downloaded from here

Steps by Step approach to predict Sentiments (Positive or Negative) for Amazon Alexa Products based upon the customer reviews.

Image by Kevin King(Chandana Perera) from Pixabay

I was always fascinated with the idea of ,how a machine can read Reviews given by the Customers and classify them as Positive or Negative and had many other questions like -

What kind of pre-processing is required to clean the data so that the machine can understand it?

How to remove punctuation marks & HTML tags. ?

Will a mixture of lowercase & Upper case letters will confuse the machine learning algorithm?

How to remove repetitive words like ‘a, an, and, the’ which do not add any value.?

How to remove emoticons 😅?

All of the above questions were answered when I solved a ‘Amazon Alexa Sentiment Analysis problem’ to predict whether the sentiment is Positive

Complete solution to solve the Machine Learning Hackathon on “Loan Approval Prediction” hosted by AnalyticsVidhya
Photo by Kids Circus on Unsplash

Here is the story of my first experience with Machine Learning Hackathon on “Loan Prediction Practice Problem” (Click Here for problem details) hosted by AnalyticsVidhya.

Complete Python code can be found on my GitHub repository.


Final submission earned me 80.55% Accuracy & a Rank of 122 (Top 2%) out of 5250 participants as of 20th Feb’21

Feature engineering helped in increasing the accuracy from 77% to 80.55% with the help of additional features

Sometimes adding new features may take down the accuracy , hence need to choose the new features carefully.

So lets straight dive into the Hackathon Arena 😃

Rahul Pednekar

I am passionate about new technologies, especially Data Science, AI and Machine Learning. Interested in developing a software that solves real-world problems.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store