4.6 Exercise 2: Graduate Data

Once you know how to use Pandas you can have lots of fun analysing various sets of data. Download the data ‘Graduate Employment Survey - NTU, NUS, SIT, SMU & SUTD’ from the Data.gov.sg website.

Use your knowledge of Python to:

  • Determine the gross monthly salary (averaged for each university) for graduates from each of the institutions.
  • Do this for the years 2013 to 2017.
  • Display this data as a bar chart.
  • Remember to clean your data appropriately.
#            gross_monthly_mean                                         
#                         count     min     max         mean         std
# university                                                            
# NTU                       167  2577.0  5225.0  3354.838323  334.284748
# NUS                       174  2500.0  5247.0  3544.293103  517.651813
# SIT                       103  2344.0  4131.0  3209.912621  378.086413
# SMU                        60  3150.0  5617.0  4053.083333  636.466312
# SUTD                        9  3565.0  4232.0  3866.777778  231.946534

4.6.1 A Solution

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt

df = pd.read_csv('graduate-employment-survey-ntu-nus-sit-smu-sutd.csv', encoding='latin-1')


new_names = {'Nanyang Technological University': 'NTU',
             'National University of Singapore': 'NUS',
             'Singapore Management University': 'SMU',
             'Singapore Institute of Technology': 'SIT',
             'Singapore University of Technology and Design': 'SUTD'}

df.replace(new_names, inplace=True)
df.replace('na', np.nan, inplace=True)

# I want convert all these columns to number types
number_columns = ['employment_rate_overall', 'employment_rate_ft_perm',
                  'basic_monthly_mean', 'basic_monthly_median',
                  'gross_monthly_mean', 'gross_monthly_median',
                  'gross_mthly_25_percentile', 'gross_mthly_75_percentile']

n = len(number_columns)
my_type_convert_dict = dict(zip(number_columns, ['float']*n))
df = df.astype(my_type_convert_dict)
# df.info()
# df.describe()

df_uni_monthly_mean = df.groupby(['university'], as_index=True).agg(
    {'gross_monthly_mean': ['count', 'min', 'max', 'mean', 'std']}
)

# Over years, over programmes
df_uni_monthly_mean.head()


df_year_uni_monthly_mean = df.groupby(['year', 'university'], as_index=True).agg(
    {'gross_monthly_mean': ['count', 'min', 'max', 'mean', 'std']}
)

# Over years, over programmes
#df_year_uni_monthly_mean.head()

# Use a pivot table
df_pivot = pd.pivot_table(df,
                          index=['year'],
                          columns='university',
                          values='gross_monthly_mean',
                          aggfunc='mean')

plt.style.use('fivethirtyeight')
ax = df_pivot.plot(kind='bar', figsize=(10, 4), rot=0)
_ = ax.set_title('Mean Gross Monthly Salaries Across Years and Universities')
_ = ax.set_ylabel('Mean Salary')
_ = ax.legend(loc=(1.025, .35))
plt.tight_layout()
plt.show()