Module 8#

Project#

Hopefully you have got a taste of the various libraries in Python that you can use for your day to day work in data science and machine learning.
To conclude the course, we will be using all these libraries and their different capabilities to solve a data science problem.

IBM HR Employee Attrition Analysis and Prediction#

Dataset: We are using a dataset put up by IBM for our analysis. The dataset contains 35 features along with Attrition target variable. This is a fictional dataset created by IBM data scientists. It can be downloaded from this link.

Objective: To predict if an employee is going to resign or not. Uncover the factors that lead to employee attrition and explore how each feature is co-related with attrition.

I would urge you to download the dataset and try out a few ideas of your own. We will then solve the problem in an online class. Here is the solution presented in class, walking through the different steps of analysis: https://www.youtube.com/watch?v=7tTSrrmlcDY

Bonus Projects#

For the sample project above, we asked a simple question, “Given these characteristics of an employee, will they resign or not?”
One can take this forward and ask various other questions such as:

  • How experienced are the employees that are leaving (are more experienced employees leaving or new recruits)?

  • Which job roles or education fields are more likely to lose employees?
    You can think of more example questions to ask given the dataset and try to do your own analysis for the same.

Here are two bonus projects that you can try out on your own. I will try to put up a set of solutions for both of them after the course ends, for your reference.

House Prices Prediction#

Dataset: For this project, the dataset used will be the Ames Housing dataset compiled by Dean De Cock for use in data science education. It is an expanded and modernized version fo the often used Boston Housing dataset. You can download the dataset from here. There are 79 explanatory variables describing (almost) every aspect of residnetial homes in Ames, Iowa.

Objective: The objective is the predict the final price of each home given the explanatory variables.

Titanic#

Dataset: For this project, the dataset used is the Titanic passenger details dataset. It can be found here.

Backdrop: The sinking of the Titanic is one of the most infamous shipwrecks in history. On April 15, 1912, during her maiden voyage, the widely considered “unsinkable” RMS Titanic sank after colliding with an iceberg. Unfortunately, there weren’t enough lifeboats for everyone onboard, resulting in the death of 1502 out of 2224 passengers and crew. While there was some element of luck involved in surviving, it seems some groups of people were more likely to survive than others.

Objective: To find what sorts of people (based on their name, age, gender, socio-economic class, etc) were more likely to survive on the Titanic.