# | include: false
import numpy as np
import pandas as pd
256852) np.random.seed(
Files, Folders & OS (Need)
What to expect in this chapter
There is no choice; you must always interact with your operating system (OS) to get anything done. This particularly applies to programming, as you must communicate with the OS to create, modify, move, copy, and delete files and directories (folders). This section is devoted to getting you up-to-speed with some Python modules (e.g., os
, glob
, shutil
) that will allow you to execute these necessary actions. I will also show you how to write code that will seamlessly run on both macOS and Windows.
1 Important concepts
In this section, I will touch on some concepts you need to navigate the OS efficiently.
In what follows, please note that I use the terms folder and directory interchangeably; they refer to the same thing.
1.1 Path
When dealing with computers, you will often encounter the term ‘path’. The path is simply a way to specify a location on your computer. It is like an address, and if you follow the path, it will take you to your file or folder.
Like specifying location, you can specify your path absolutely or relatively. So, for example, I can specify that SPS is located on level 3 of block S16. However, if I am already on Level 5 of S16, I can say, go two floors down. The former is an absolute path, and the latter is relative. I have always found it easier to use relative paths, especially if I later want to move my folders about.
Remember that the path tells us how to find a file or folder and that you can specify it absolutely or relatively.
For example, here is an absolute path to a file on the Desktop
on a Windows machine.
C:\\Users\Chammika\Desktop\data-01.txt
1.2 More about relative paths
When dealing with relative paths, you will find it helpful to know .
and ..
notation.
Notation | Meaning |
---|---|
. |
‘this folder’ |
.. |
‘one folder above’ |
So,
.\data-files\data-01.txt
means the filedata-01.txt
in the folderdata-files
in the current folder...\data-files\data-01.txt
means the filedata-01.txt
in the folderdata-files
located in the folder above.
Remember .
means current folder, and ..
means one folder up.
macOS or Linux
macOS and Linux allow you to use ~
to refer to your home directory. So, for example, you can access the Desktop
in these systems ‘relatively’ with ~/Desktop
. So, I can look for a file in my Desktop using:
~\Desktop\data-01.txt
1.3 Path separator
Today’s major OSs (Windows, macOS, Linux) offer similar graphical environments. However, one of the most striking differences between Windows and macOS (or Linux) is the path separator.
Windows uses \
as the path separator while macOS (or Linux) uses /
. So, the absolute path to a file on the Desktop on each of these systems will look like this:
Windows | C:\\Users\chammika\Desktop\data-01.txt |
macOS (or Linux) | /Users/chammika/Desktop/data-01.txt |
If you want to share your code and want it to work on both systems, you must not hardcode either path separator. Later, I will show you how to use the Python os
package to fix this problem.
1.4 Text files vs. Binary files
You can think of all files on your computer as being either text files or binary files. Text files are simple and can be opened, and their contents examined by almost any software (e.g., Notepad, TextEdit, Jupiter,…). Examples of text file formats are .txt
, .md
or .csv
.
Binary files, in contrast, require some processing to make sense of what they contain. For example, if you look at the raw data in a .png
file, you will see gibberish. In addition, some binary files will only run on specific OSs. For example, the Excel.app
on a Mac will not run on Windows, nor will the Excel.exe
file run on macOS (or Linux). Some reasons for having binary files are speed and size; text files, though simple, can get bulky.
1.5 Extensions
Files are usually named to end with an extension separated from the name by a .
like name.extension
. This extension
lets the OS know what software or app to use to extract the details in a file. For example, a .xlsx
means use Excel or .pptx
means use PowerPoint. Be careful about changing the extension of a file, as it will make your OS cough and throw a fit. If you don’t believe me, try changing a .xlsx
to .txt
and double-click.
2 Opening and closing files
Now, let’s look at how we can open a file for reading and writing. I will show you a slightly advanced but better way of doing this by using the with
statement (called a context manager). First, please download the file spectrum-01.txt into the current folder in your Learning Portfolio.
2.1 Reading data
Here is what you would typically do to read a text file.
with open('spectrum-01.txt', 'r') as file:
= file.read()
file_content
print(file_content)
The open()
function ‘opens’ your file. The 'r'
specifies that I only want to r
ead from the file. Using with
frees you from worrying about closing the file after you are done.
2.2 Writing data
Now, let’s write the following into a file.
= 'Far out in the uncharted backwaters of the unfashionable end of the western spiral arm of the Galaxy lies a small unregarded yellow sun.\nOrbiting this at a distance of roughly ninety-two million miles is an utterly insignificant little blue green planet whose ape-descended life forms are so amazingly primitive that they still think digital watches are a pretty neat idea.' text
I will use two writing methods just so that I can show you how they work.
Writing to a file in one go
First, let’s write everything in one go.
with open('my-text-once.txt', 'w') as file:
file.write(text)
You should now have a file my-text-once.txt
in your directory. You should open it to take a look. By the way, the 'w'
indicates that I am opening the file for w
riting.
Writing to a file, line by line
Let me show you how to write a line at a time. This is useful when dealing with data generated on the fly. Since I don’t have such data now, I will split the lines of the previous text [The contents in both files will be slightly different. However, this is not a time to worry about that.].
with open('my-text-lines.txt', 'w') as file:
for line in text.splitlines():
file.writelines(line)
I must add that writing to a file is a very slow operation. So, it will slow things down if you do it in a loop.
3 Some useful packages
Let me show you how to programmatically create, copy, and delete files and folders and navigate the OS. I will use the following three packages for these tasks.
Package | Primarily used for |
---|---|
os |
To ‘talk’ to the OS to create, modify, delete folders and write OS-agnostic code. |
glob |
To search for files. |
shutil |
To copy files. |
I am using os
and shutil
because shutil
offers some function (e.g., shutil.copy()
) that os
does not. There are also subtle differences in what these functions do, but let’s not worry about that now.
These packages are already part of the standard Python library. So you do not have to install them. Let’s import the packages first.
import os
import glob
import shutil
4 OS safe paths
Consider a file data-01.txt
in the sub-directory sg-data
of the directory all-data
.
all-data
\(\longrightarrow\) sg-data
\(\longrightarrow\) data-01.txt
If I want to access data-01.txt
all I have to do is:
= os.path.join('.', 'all-data', 'sg-data', 'data-01.txt')
path print(path)
If you are on Windows, you will see.
'.\\all-data\\sg-data\\data-01.txt'
Else, it will be
'./all-data/sg-data/data-01.txt'
So, using os.path.join()
will adjust your path with either /
or \
as necessary. This means your code will seamlessly run on all the OS.
5 Folders
5.1 Creating folders
You can create a folder programmatically using os.mkdir()
. This is very useful because you can write a tiny bit of code to quickly organise your data. For example, let’s say we need to store information about the people ‘John’, ‘Paul’ and ‘Ringo’. I can quickly create some folders for this by:
'people')
os.mkdir(
for person in ['John', 'Paul', 'Ringo']:
= os.path.join('people', person)
path print(f'Creating {path}')
os.mkdir(path)
Creating people/John
Creating people/Paul
Creating people/Ringo
You don’t need the print()
statement. I have included it so I have some feedback on what is (or is not) happening.
5.2 Checking for existence
Python will complain if you try to run this code twice, saying that the file (yes, Python refers to folders as files) already exists. So, when you create resources, it is a good idea to check if they already exist. There are two ways to do this: use try-except
with the FileExistsError
or use os.path.exists()
.
Using try-except
for person in ['John', 'Paul', 'Ringo']:
= os.path.join('people', person)
path try:
os.mkdir(path)print(f'Creating {path}')
except FileExistsError:
print(f'{path} already exists; skipping creation.')
Using os.path.exists()
for person in ['John', 'Paul', 'Ringo']:
= os.path.join('people', person)
path if os.path.exists(path):
print(f'{path} already exists; skipping creation.')
else:
os.mkdir(path)print(f'Creating {path}')
5.3 Copying files
Let me show you how to copy files programmatically.
First, there should be a copy of the 73 logo (sp2273_logo.png
) in the current folder. Then, I will copy this into the folders I created for ‘John’, ‘Paul,’ and ‘Ringo’.
for person in ['John', 'Paul', 'Ringo']:
= os.path.join('people', person)
path_to_destination 'sp2273_logo.png', path_to_destination)
shutil.copy(print(f'Copied file to {path_to_destination}')
'people/John/sp2273_logo.png'
Copied file to people/John
'people/Paul/sp2273_logo.png'
Copied file to people/Paul
'people/Ringo/sp2273_logo.png'
Copied file to people/Ringo
Let’s say I want all the images in a sub-folder called imgs
in each person’s directory. I can do this by first creating the folders imgs
and then moving the logo file into that folder.
for person in ['John', 'Paul', 'Ringo']:
# Create folder 'imgs'
= os.path.join('people', person, 'imgs')
path_to_imgs if not os.path.exists(path_to_imgs):
os.mkdir(path_to_imgs)
# Move logo file
= os.path.join('people', person, 'sp2273_logo.png')
current_path_of_logo = os.path.join('people', person, 'imgs', 'sp2273_logo.png')
new_path_of_logo
shutil.move(current_path_of_logo, new_path_of_logo)print(f'Moved logo to {new_path_of_logo}')
'people/John/imgs/sp2273_logo.png'
Moved logo to people/John/imgs/sp2273_logo.png
'people/Paul/imgs/sp2273_logo.png'
Moved logo to people/Paul/imgs/sp2273_logo.png
'people/Ringo/imgs/sp2273_logo.png'
Moved logo to people/Ringo/imgs/sp2273_logo.png
You can do all these extremely fast using only the terminal and its loops structure. Just letting you know if you want to explore on your own.
6 Listing and looking for files
If I want to know what files are in a folder, then glob
does easy work of this. Let me show you how to use it.
-
I use this if I want all the files in the current directory.
The
*
is called a wildcard and is read as ‘anything’. So, I am askingglob
to give me anything in the folder.'*') glob.glob(
-
'peo*') glob.glob(
I want to refine my search and ask
glob
to give only those files that match the pattern ‘peo’ followed by ‘anything’.['people']
-
I now want to know what is inside the folders that start with
peo
.'peo*/*') glob.glob(
[people/Ringo people/Paul people/John]
-
Now, I want to see the whole, detailed structure of the folder
people
. For this, I need to tellglob
to search recursively (i.e. dig through all sub-file directories) by puttingrecursive=True
.I must also use two wildcards
**
to say all ‘sub-directories’.'people/**', recursive=True) glob.glob(
[people/ people/Ringo people/Ringo/imgs people/Ringo/imgs/sp2273_logo.png people/Paul people/Paul/imgs people/Paul/imgs/sp2273_logo.png people/John people/John/imgs people/John/imgs/sp2273_logo.png]
-
I want only the
.png
files. So, I just need to modify my pattern. I am askingglob
to go through the whole structure ofpeople
and show me those files with the pattern ‘anything’.png
!'people/**/*.png', recursive=True) glob.glob(
[people/Ringo/imgs/sp2273_logo.png people/Paul/imgs/sp2273_logo.png people/John/imgs/sp2273_logo.png]
7 Extracting file info
When dealing with files and folders, you often have to extract the filename, folder or extension. You can do this by simple string manipulation; for example if I want the filename and extension:
= 'people/Ringo/imgs/sp2273_logo.png'
path = path.split(os.path.sep)[-1]
filename = filename.split('.')[-1]
extension print(filename, extension)
os.path.sep
is the path separator (i.e. \
or /
) for the OS. I split the path where the separator occurred and picked the last element in the list. I use a similar strategy for the file extension.
However, if you like, os
provides some simple functions for these tasks.
= 'people/Ringo/imgs/sp2273_logo.png' path
# Split filename from the rest os.path.split(path)
('people/Ringo/imgs', 'sp2273_logo.png')
# Split extension os.path.splitext(path)
('people/Ringo/imgs/sp2273_logo', '.png')
# Show the directory os.path.dirname(path)
'people/Ringo/imgs'
8 Deleting stuff
Lastly, let me show you how to delete stuff.
If you want to remove a file:
'people/Ringo/imgs/sp2273_logo.png') os.remove(
This won’t work with directories. For an empty directory, use:
'people/Ringo') os.rmdir(
For a directory with files, use shutil
:
'people/Ringo') shutil.rmtree(
It goes without saying that you should be careful when using these functions. Unfortunately, I have had some miserable experiences by accidentally deleting files because I was more enthusiastic than sensible. With great power comes great responsibility, so use with extreme caution!