# | include: false
import numpy as np
import pandas as pd
256852) np.random.seed(
Storing Data (Good)
What to expect in this chapter
You should now know how to store data using lists, arrays and dictionaries. I will now show you more details on accessing and modifying these structures. This is important because most of what you do with programming is related to accessing and changing data. You will also gain a better understanding of the differences and similarities between lists, NumPy arrays and dictionaries.
1 Subsetting: Indexing and Slicing
You will often need to select a subset (subsetting) of the data in a list (or array). One form of this is picking a single element called indexing (You already know how to do this from the previous chapter). Another option is to select a range of elements. This is called slicing.
So, in summary, what we mean when we say…
- Subsetting means to ‘select’.
- Indexing refers to selecting one element.
- Slicing refers to selecting a range of elements.
1.1 Lists & Arrays in 1D | Subsetting & Indexing
Since slicing gives us a range of elements, we must specify two indices to indicate where to start and end. The various syntaxes for these are shown in the table below.
The following applies to both lists and arrays.
=["a1", "b2", "c3", "d4", "e5",
py_list"f6", "g7", "h8", "i9", "j10"]
=np.array(py_list)
np_array
# Pick one
= py_list # OR
x = np_array x
Syntax | Result | Note | |
---|---|---|---|
x[0] |
First element | 'a1' |
|
x[-1] |
Last element | 'j10' |
|
x[0:3] |
Index 0 to 2 | ['a1','b2','c3'] |
Gives \(3−0=3\) elements |
x[1:6] |
Index 1 to 5 | ['b2','c3','d4','e5','f6'] |
Gives \(6−1=5\) elements |
x[1:6:2] |
Index 1 to 5 in steps of 2 | ['b2','d4','f6'] |
Gives every other of \(6−1=5\) elements |
x[5:] |
Index 5 to the end | ['f6','g7','h8','i9','j10'] |
Gives len(x) \(−5=5\) elements |
x[:5] |
Index 0 to 5 | ['a1','b2','c3','d4','e5'] |
Gives \(5−0=5\) elements |
x[5:2:-1] |
Index 5 to 3 (i.e., in reverse) | ['f6','e5','d4'] |
Gives \(5−2=3\) elements |
x[::-1] |
Reverses the list | ['j10','i9','h8',...,'b2','a1'] |
Remember slicing in Python can be a bit tricky.
If you slice with [i:j]
, the slice will start at i
and end at j-1
, giving you a total of j-i
elements.
1.2 Arrays only | Subsetting by masking
One of the most powerful things you can do with NumPy arrays is subsetting by masking. To make sense of this, consider the following.
= np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
np_array = np_array > 3
my_mask my_mask
array([False, False, False, True, True, True, True, True, True,
True])
The answer to my question is in the form of a ‘Yes’/‘No’ or True
/False
format. I can use this True
/False
format to ask NumPy to show me only those that are True
by
np_array[my_mask]
array([ 4, 5, 6, 7, 8, 9, 10])
This is why I used the term ‘masking’. The True
/False
answer acts like a mask allowing only the True
subset to be seen.
Remember that subsetting by masking only works with NumPy arrays.
Instead of creating another variable, I can also do all of this succinctly as:
> 3] np_array[np_array
array([ 4, 5, 6, 7, 8, 9, 10])
Let me show you a few more quick examples
-
~(np_array > 3)] # '~' means 'NOT' np_array[
We can invert our mask by using the
~
.~
is called the Bitwise Not operator.array([1, 2, 3])
-
> 3) & (np_array < 8)] # '&' means 'AND' np_array[(np_array
We can combine one mask AND another mask.
(AND
will show something only if both masks are true.)array([4, 5, 6, 7])
-
< 3) | (np_array > 8)] # '|' means 'OR' np_array[(np_array
We can combine one mask OR another mask.
(OR
will show something if either mask is true.)array([ 1, 2, 9, 10])
Remember:
- Always use the Bitwise NOT(
~
), Bitwise OR(|
) and Bitwise AND(&
) when combining masks with NumPy. - Always use brackets to clarify what you are asking the mask to do.
1.3 Lists & Arrays in 2D | Indexing & Slicing
The differences between lists and arrays become even more apparent with higher dimensional lists and arrays. Especially when you try indexing and slicing in higher dimensions.
Let’s consider the following 2D list.
= [[1, "A"], [2, "B"], [3, "C"], [4, "D"],
py_list_2d 5, "E"], [6, "F"], [7, "G"], [8, "H"],
[9, "I"], [10, "J"]]
[
= np.array(py_list_2d) np_array_2d
-
What is at position 4 (index 3)?
3] py_list_2d[3] np_array_2d[
[4, 'D']
array(['4', 'D'], dtype='<U21')
-
What is the FIRST element at position 4 (index 3)
3][0] py_list_2d[3, 0] np_array_2d[
4
'4'
Notice how the syntax for arrays uses just a single pair of square brackets (
[ ]
). -
What are the first three elements?
3] py_list_2d[:3] np_array_2d[:
[[1, 'A'], [2, 'B'], [3, 'C']]
array([['1', 'A'], ['2', 'B'], ['3', 'C']], dtype='<U21')
-
Hmmm…
3][0] py_list_2d[:3, 0] np_array_2d[:
[1, 'A']
You might think that this will yield the first elements (i.e.,
[1, 2, 3]
) of all the sub-lists up to index 2.
No! Instead, it gives the first of the list you get frompy_list_2d[:3]
.array(['1', '2', '3'], dtype='<U21')
Notice how differently NumPy arrays work.
-
3:6][0] py_list_2d[3:6, 0] np_array_2d[0] np_array_2d[:,
[4, 'D']
array(['4', '5', '6'], dtype='<U21')
If you want ‘everything’ you just use
:
.array(['1', '2', '3', '4', '5', '6', '7', '8', '9', '10'], dtype='<U21')
1.4 Growing lists
NumPy arrays are invaluable, and their slicing syntax (e.g. [:3,0]
) is more intuitive than lists. So, why do we even bother with lists? One advantage of lists is their ease and efficiency in growing. NumPy arrays are fantastic for fast math operations, provided you do not change their size1. So, I will not discuss how to change the size of a NumPy array. Instead, let me show you how to grow a list. This will be useful later; for instance when you try to solve differential equations numerically.
-
Creating a larger list from a smaller one.
=[1, 2]*5 x x
[1, 2, 1, 2, 1, 2, 1, 2, 1, 2]
-
Three ways to grow a list by appending one element at a time.
=[1] x= x + [2] x= x + [3] x= x + [4] x x=[1] x+= [2] x+= [3] x+= [4] x x=[1] x2) x.append(3) x.append(4) x.append( x
[1, 2, 3, 4]
[1, 2, 3, 4]
[1, 2, 3, 4]
If you are wondering, there are differences between these three versions. Their execution speeds are different; the version with
append()
runs about 1.5 times faster than the rest! -
Here are three ways of incorporating multiple elements.
Notice the difference between the effects ofextend()
andappend()
.= [1, 2, 3] x += [4, 5, 6] x x=[1, 2, 3] x4, 5, 6]) x.extend([ x=[1, 2, 3] x4, 5, 6]) x.append([ x
[1, 2, 3, 4, 5, 6]
[1, 2, 3, 4, 5, 6]
[1, 2, 3, [4, 5, 6]]
Some loose ends
1.5 Tuples
Before we end this section, I must introduce you to another data storage structure called a tuple. Tuples are similar to lists, except they use ( )
and cannot be changed after creation (i.e., they are immutable).
Let me first create a simple tuple.
=(1, 2, 3) # Define tuple a
We can access its data…
print(a[0]) # Access data
1
But, we cannot change the data.
# The following will NOT work
0]=-1
a[0]+= [10] a[
1.6 Be VERY careful when copying
Variables in Python have subtle features that might make your life miserable if you are not careful. You should be particularly mindful when making copies of lists and arrays.
For example, if you want to copy a list, you might be tempted to do the following; PLEASE DON’T!
=[1, 2, 3]
x=x # DON'T do this!
y=x # DON'T do this! z
The correct way to do this is as follows:
=[1, 2, 3]
x=x.copy()
y=x.copy() z
Note: At this stage, you only have to know that you must use copy()
to be safe; you do not have to understand why. However, if you want to, please refer to the discussion on mutable and immutable objects.
Footnotes
The gains in speed are due to NumPy doing things to all the elements in the array in one go. For this, the data needs to be stored in a specific order in memory. Adding or removing elements hinders this optimization. When you change the size of a NumPy array, NumPy destroys the existing array and creates a new one, making it extremely inefficient.↩︎