Once you know how to use Pandas you can have lots of fun analysing various sets of data. Download the data ‘Graduate Employment Survey - NTU, NUS, SIT, SMU & SUTD’ from the Data.gov.sg website.

Use your knowledge of Python to:

• Determine the gross monthly salary (averaged for each university) for graduates from each of the institutions.
• Do this for the years 2013 to 2017.
• Display this data as a bar chart.
• Remember to clean your data appropriately.
#            gross_monthly_mean
#                         count     min     max         mean         std
# university
# NTU                       167  2577.0  5225.0  3354.838323  334.284748
# NUS                       174  2500.0  5247.0  3544.293103  517.651813
# SIT                       103  2344.0  4131.0  3209.912621  378.086413
# SMU                        60  3150.0  5617.0  4053.083333  636.466312
# SUTD                        9  3565.0  4232.0  3866.777778  231.946534

4.6.1 A Solution

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt

new_names = {'Nanyang Technological University': 'NTU',
'National University of Singapore': 'NUS',
'Singapore Management University': 'SMU',
'Singapore Institute of Technology': 'SIT',
'Singapore University of Technology and Design': 'SUTD'}

df.replace(new_names, inplace=True)
df.replace('na', np.nan, inplace=True)

# I want convert all these columns to number types
number_columns = ['employment_rate_overall', 'employment_rate_ft_perm',
'basic_monthly_mean', 'basic_monthly_median',
'gross_monthly_mean', 'gross_monthly_median',
'gross_mthly_25_percentile', 'gross_mthly_75_percentile']

n = len(number_columns)
my_type_convert_dict = dict(zip(number_columns, ['float']*n))
df = df.astype(my_type_convert_dict)
# df.info()
# df.describe()

df_uni_monthly_mean = df.groupby(['university'], as_index=True).agg(
{'gross_monthly_mean': ['count', 'min', 'max', 'mean', 'std']}
)

# Over years, over programmes

df_year_uni_monthly_mean = df.groupby(['year', 'university'], as_index=True).agg(
{'gross_monthly_mean': ['count', 'min', 'max', 'mean', 'std']}
)

# Over years, over programmes

# Use a pivot table
df_pivot = pd.pivot_table(df,
index=['year'],
columns='university',
values='gross_monthly_mean',
aggfunc='mean')

plt.style.use('fivethirtyeight')
ax = df_pivot.plot(kind='bar', figsize=(10, 4), rot=0)
_ = ax.set_title('Mean Gross Monthly Salaries Across Years and Universities')
_ = ax.set_ylabel('Mean Salary')
_ = ax.legend(loc=(1.025, .35))
plt.tight_layout()
plt.show()