Storing Data (Good)

What to expect in this chapter

You should now know how to store data using lists, arrays and dictionaries. I will now show you more details on accessing and modifying these structures. This is important because most of what you do with programming is related to accessing and changing data. You will also gain a better understanding of the differences and similarities between lists, NumPy arrays and dictionaries.

1 Subsetting: Indexing and Slicing

You will often need to select a subset (subsetting) of the data in a list (or array). One form of this is picking a single element called indexing (You already know how to do this from the previous chapter). Another option is to select a range of elements. This is called slicing.

So, in summary, what we mean when we say…

  • Subsetting means to ‘select’.
  • Indexing refers to selecting one element.
  • Slicing refers to selecting a range of elements.

1.1 Lists & Arrays in 1D | Subsetting & Indexing

Since slicing gives us a range of elements, we must specify two indices to indicate where to start and end. The various syntaxes for these are shown in the table below.

The following applies to both lists and arrays.

py_list=["a1", "b2", "c3", "d4", "e5",
         "f6", "g7", "h8", "i9", "j10"]
np_array=np.array(py_list)

# Pick one
x = py_list  # OR
x = np_array
Syntax Result Note
x[0] First element 'a1'
x[-1] Last element 'j10'
x[0:3] Index 0 to 2 ['a1','b2','c3'] Gives \(3−0=3\) elements
x[1:6] Index 1 to 5 ['b2','c3','d4','e5','f6'] Gives \(6−1=5\) elements
x[1:6:2] Index 1 to 5 in steps of 2 ['b2','d4','f6'] Gives every other of \(6−1=5\) elements
x[5:] Index 5 to the end ['f6','g7','h8','i9','j10'] Gives len(x)\(−5=5\) elements
x[:5] Index 0 to 5 ['a1','b2','c3','d4','e5'] Gives \(5−0=5\) elements
x[5:2:-1] Index 5 to 3 (i.e., in reverse) ['f6','e5','d4'] Gives \(5−2=3\) elements
x[::-1] Reverses the list ['j10','i9','h8',...,'b2','a1']
Remember

Remember slicing in Python can be a bit tricky.
If you slice with [i:j], the slice will start at i and end at j-1, giving you a total of j-i elements.

1.2 Arrays only | Subsetting by masking

One of the most powerful things you can do with NumPy arrays is subsetting by masking. To make sense of this, consider the following.

np_array = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
my_mask = np_array > 3
my_mask
array([False, False, False,  True,  True,  True,  True,  True,  True,
        True])

The answer to my question is in the form of a ‘Yes’/‘No’ or True/False format. I can use this True/False format to ask NumPy to show me only those that are True by

np_array[my_mask]
array([ 4,  5,  6,  7,  8,  9, 10])

This is why I used the term ‘masking’. The True/False answer acts like a mask allowing only the True subset to be seen.

Remember

Remember that subsetting by masking only works with NumPy arrays.

Instead of creating another variable, I can also do all of this succinctly as:

np_array[np_array > 3]
array([ 4,  5,  6,  7,  8,  9, 10])

Let me show you a few more quick examples

  1. np_array[~(np_array > 3)]                 # '~' means 'NOT'

    We can invert our mask by using the ~.
    ~ is called the Bitwise Not operator.

    array([1, 2, 3])
  2. np_array[(np_array > 3) & (np_array < 8)] # '&' means 'AND'

    We can combine one mask AND another mask.
    (AND will show something only if both masks are true.)

    array([4, 5, 6, 7])
  3. np_array[(np_array < 3) | (np_array > 8)] # '|' means 'OR'

    We can combine one mask OR another mask.
    (OR will show something if either mask is true.)

    array([ 1,  2,  9, 10])
Remember

Remember:

  • Always use the Bitwise NOT(~), Bitwise OR(|) and Bitwise AND(&) when combining masks with NumPy.
  • Always use brackets to clarify what you are asking the mask to do.

1.3 Lists & Arrays in 2D | Indexing & Slicing

The differences between lists and arrays become even more apparent with higher dimensional lists and arrays. Especially when you try indexing and slicing in higher dimensions.

Let’s consider the following 2D list.

py_list_2d = [[1, "A"], [2, "B"], [3, "C"], [4, "D"],
              [5, "E"], [6, "F"], [7, "G"], [8, "H"],
              [9, "I"], [10, "J"]]

np_array_2d = np.array(py_list_2d)
  1. What is at position 4 (index 3)?

    py_list_2d[3]
    np_array_2d[3]
    [4, 'D']
    array(['4', 'D'], dtype='<U21')
  2. What is the FIRST element at position 4 (index 3)

    py_list_2d[3][0]
    np_array_2d[3, 0]
    4
    '4'

    Notice how the syntax for arrays uses just a single pair of square brackets ([ ]).

  3. What are the first three elements?

    py_list_2d[:3]
    np_array_2d[:3]
    [[1, 'A'], [2, 'B'], [3, 'C']]
    array([['1', 'A'],
           ['2', 'B'],
           ['3', 'C']], dtype='<U21')
  4. Hmmm…

    py_list_2d[:3][0]
    np_array_2d[:3, 0]
    [1, 'A']

    You might think that this will yield the first elements (i.e., [1, 2, 3]) of all the sub-lists up to index 2.
    No! Instead, it gives the first of the list you get from py_list_2d[:3].

    array(['1', '2', '3'], dtype='<U21')

    Notice how differently NumPy arrays work.

  5. py_list_2d[3:6][0]
    np_array_2d[3:6, 0]
    np_array_2d[:, 0]
    [4, 'D']
    array(['4', '5', '6'], dtype='<U21')

    If you want ‘everything’ you just use :.

    array(['1', '2', '3', '4', '5', '6', '7', '8', '9', '10'], dtype='<U21')

1.4 Growing lists

NumPy arrays are invaluable, and their slicing syntax (e.g. [:3,0]) is more intuitive than lists. So, why do we even bother with lists? One advantage of lists is their ease and efficiency in growing. NumPy arrays are fantastic for fast math operations, provided you do not change their size1. So, I will not discuss how to change the size of a NumPy array. Instead, let me show you how to grow a list. This will be useful later; for instance when you try to solve differential equations numerically.

  1. Creating a larger list from a smaller one.

    x=[1, 2]*5
    x
    [1, 2, 1, 2, 1, 2, 1, 2, 1, 2]
  2. Three ways to grow a list by appending one element at a time.

    x=[1]
    x= x + [2]
    x= x + [3]
    x= x + [4]
    x
    x=[1]
    x+= [2]
    x+= [3]
    x+= [4]
    x
    x=[1]
    x.append(2)
    x.append(3)
    x.append(4)
    x
    [1, 2, 3, 4]
    [1, 2, 3, 4]
    [1, 2, 3, 4]

    If you are wondering, there are differences between these three versions. Their execution speeds are different; the version with append() runs about 1.5 times faster than the rest!

  3. Here are three ways of incorporating multiple elements.
    Notice the difference between the effects of extend() and append().

    x = [1, 2, 3]
    x += [4, 5, 6]
    x
    x=[1, 2, 3]
    x.extend([4, 5, 6])
    x
    x=[1, 2, 3]
    x.append([4, 5, 6])
    x
    [1, 2, 3, 4, 5, 6]
    [1, 2, 3, 4, 5, 6]
    [1, 2, 3, [4, 5, 6]]

Some loose ends

1.5 Tuples

Before we end this section, I must introduce you to another data storage structure called a tuple. Tuples are similar to lists, except they use ( ) and cannot be changed after creation (i.e., they are immutable).

Let me first create a simple tuple.

a=(1, 2, 3)     # Define tuple

We can access its data…

print(a[0])    # Access data
1

But, we cannot change the data.

# The following will NOT work
a[0]=-1
a[0]+= [10]

1.6 Be VERY careful when copying

Variables in Python have subtle features that might make your life miserable if you are not careful. You should be particularly mindful when making copies of lists and arrays.

For example, if you want to copy a list, you might be tempted to do the following; PLEASE DON’T!

x=[1, 2, 3]
y=x           # DON'T do this!
z=x           # DON'T do this!

The correct way to do this is as follows:

x=[1, 2, 3]
y=x.copy()
z=x.copy()

Note: At this stage, you only have to know that you must use copy() to be safe; you do not have to understand why. However, if you want to, please refer to the discussion on mutable and immutable objects.

Back to top

Footnotes

  1. The gains in speed are due to NumPy doing things to all the elements in the array in one go. For this, the data needs to be stored in a specific order in memory. Adding or removing elements hinders this optimization. When you change the size of a NumPy array, NumPy destroys the existing array and creates a new one, making it extremely inefficient.↩︎