13| Graduate Data
Introduction
The Singapore government maintains and publicly shares various datasets at Data.gov.sg. In this question, we will analyse the median salaries of graduates from the different universities in Singapore.
You need Pandas
This question heavily utilises the package called Pandas. Pandas is the de facto package for all things related to data analysis. You must familiarise yourself (see the Knowledge Lake) with how to use Pandas to work on this problem.
Tasks
Download data
Download the data ‘Graduate Employment Survey - NTU, NUS, SIT, SMU & SUTD’ from the Data.gov.sg website.
Remember to read the descriptions of the data set so that you can know the data setup to your needs.
Read the data
Load the data into a Pandas dataframe. You might have to specify encoding='latin-1'
A quick peek
Quickly explore your dataset using the typical (e.g. info()
, head()
, shape
) Pandas options.
Data Cleaning
Clean up the dataset by: 1. Shorter Names for the Universities 2. na
replaced with np.nan
3. basic_monthly_median
changed to type float
.
Analyse the Data
Create a table of data that shows the minimum, maximum and mean for the basic_monthly_median
as shown below.
Time to plot
Use the data you generated in the previous task to produce the plot shown above.
You might have to learn how to use the method pivot_table()