4.4 The Convenience of Dataframes
4.4.1 (Mini Exercise) Method 1: Using a Spreadsheet
Imagine that you have administered two tests to your class of ten (10) students. You now need to combine the results to calculate the total score. Given below are the details of the class and tests.
Description | Max Score | Weight | No. absent | Filename | Download link |
---|---|---|---|---|---|
Test 01 | 25 | 35% | 1 | test01.xlsx |
Link |
Test 02 | 35 | 65% | 3 | test02.xlsx |
Link |
Students details | - | - | - | class.xlsx |
Link |
If you like you can download a single zip file containing all three files.
What you have to do
- Download the three Excel files and place them in a folder on the Desktop.
- Please name the folder ‘workshop’ so that everyone has the same name.
- Use your favourite tool (it can be any tool) to combine the data for
Test 01
andTest 02
to get the final score for each student.
Please bear in mind:
- There are absent students.
- Max Score indicates the highest score possible for that test.
- The tests have different weights.
\[\text{final score} = \text{score for test 1}\times\left(\dfrac{35}{25} \right) + \text{score for test 2}\times\left(\dfrac{65}{35}\right) \]
4.4.2 Method 2: Using a Dataframes
Lets solve the above problem using a dataframe! Because dataframes uses an index, we can easily add different dataframes without having to worry about alignment or missing rows!
import pandas as pd
= pd.read_excel('class.xlsx')
df_class_data = pd.read_excel('test01.xlsx')
df_test_1_data = pd.read_excel('test02.xlsx')
df_test_2_data
for df in [df_class_data, df_test_1_data, df_test_2_data]:
='Student No',inplace=True)
df.set_index(keys
# Consolidate scores. Adjust for the maximum score and weights
# Absent students get zero
'Test 01'] = df_test_1_data['Score']/25*35 # New column
df_class_data['Test 02'] = df_test_2_data['Score']/35*65 # New column
df_class_data[0, inplace=True)
df_class_data.fillna(
'Total'] = df_class_data['Test 01'] + df_class_data['Test 02']
df_class_data[= df_class_data.round(decimals=2)
df_class_data 'class_w_scores.xlsx', index=False)
df_class_data.to_excel(
df_class_data.describe()# Test 01 Test 02 Total
# count 10.000000 10.000000 10.00000
# mean 20.311000 24.963000 45.27500
# std 7.272026 17.449038 22.23347
# min 0.000000 0.000000 0.00000
# 25% 21.295000 7.297500 30.04250
# 50% 22.170000 35.575000 56.50500
# 75% 23.587500 35.905000 59.87750
# max 24.820000 40.780000 63.83000