4.6 Exercise 2: Graduate Data
Once you know how to use Pandas you can have lots of fun analysing various sets of data. Download the data ‘Graduate Employment Survey - NTU, NUS, SIT, SMU & SUTD’ from the Data.gov.sg website.
Use your knowledge of Python to:
- Determine the gross monthly salary (averaged for each university) for graduates from each of the institutions.
- Do this for the years 2013 to 2017.
- Display this data as a bar chart.
- Remember to clean your data appropriately.
# gross_monthly_mean
# count min max mean std
# university
# NTU 167 2577.0 5225.0 3354.838323 334.284748
# NUS 174 2500.0 5247.0 3544.293103 517.651813
# SIT 103 2344.0 4131.0 3209.912621 378.086413
# SMU 60 3150.0 5617.0 4053.083333 636.466312
# SUTD 9 3565.0 4232.0 3866.777778 231.946534
4.6.1 A Solution
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
= pd.read_csv('graduate-employment-survey-ntu-nus-sit-smu-sutd.csv', encoding='latin-1')
df
= {'Nanyang Technological University': 'NTU',
new_names 'National University of Singapore': 'NUS',
'Singapore Management University': 'SMU',
'Singapore Institute of Technology': 'SIT',
'Singapore University of Technology and Design': 'SUTD'}
=True)
df.replace(new_names, inplace'na', np.nan, inplace=True)
df.replace(
# I want convert all these columns to number types
= ['employment_rate_overall', 'employment_rate_ft_perm',
number_columns 'basic_monthly_mean', 'basic_monthly_median',
'gross_monthly_mean', 'gross_monthly_median',
'gross_mthly_25_percentile', 'gross_mthly_75_percentile']
= len(number_columns)
n = dict(zip(number_columns, ['float']*n))
my_type_convert_dict = df.astype(my_type_convert_dict)
df # df.info()
# df.describe()
= df.groupby(['university'], as_index=True).agg(
df_uni_monthly_mean 'gross_monthly_mean': ['count', 'min', 'max', 'mean', 'std']}
{
)
# Over years, over programmes
df_uni_monthly_mean.head()
= df.groupby(['year', 'university'], as_index=True).agg(
df_year_uni_monthly_mean 'gross_monthly_mean': ['count', 'min', 'max', 'mean', 'std']}
{
)
# Over years, over programmes
#df_year_uni_monthly_mean.head()
# Use a pivot table
= pd.pivot_table(df,
df_pivot =['year'],
index='university',
columns='gross_monthly_mean',
values='mean')
aggfunc
'fivethirtyeight')
plt.style.use(= df_pivot.plot(kind='bar', figsize=(10, 4), rot=0)
ax = ax.set_title('Mean Gross Monthly Salaries Across Years and Universities')
_ = ax.set_ylabel('Mean Salary')
_ = ax.legend(loc=(1.025, .35))
_
plt.tight_layout() plt.show()