Reading Files
Imagine you have a file that contains the following text:
This is an example file.
It has several lines of text.
Nothing terribly interesting.
You can read and print every line in this file using the following code:
file = open('example.txt')
for line in file:
print(line)
file.close()
When you call open('example.txt')
, this will open a file in the same
directory as the script you are writing. The open()
function returns a
file object
, which we store in the file
variable.
After you open the file, you can iterate over all of its lines using:
for line in file
You then have to remember to close the file, with file.close()
.
A better way to open files
A better way to open files uses the with
keyword:
with open('example.txt') as file:
for line in file:
print(line)
This code does the same thing! It opens the file in example.txt
, sets the
file
variable to be a file object that can read this file, and then uses
for line in file
to iterate over the lines in the file.
An important feature of this syntax is that the file is automatically closed
when you finish the with
block. We will use this syntax in class because
having the file be automatically closed is helpful.
Iteration
We previously saw iteration with lists. Remember:
numbers = [2, 23, 17, 75]
for number in numbers:
print(number*2)
Iteration over the lines of a file uses the same for ... in
statement:
with open('example.txt') as file:
for line in file:
print(line)
Getting all the lines in a file
This is a useful function that we will reuse throughout the course:
def get_lines(filename):
with open(filename) as file:
lines = []
for line in file:
lines.append(line)
return lines
This function collects all the lines in the file into a list and then returns that list. If you call it:
print(get_lines('example.txt'))
You will see this output:
['This is an example file.\n', 'It has several lines of text.\n', 'Nothing terribly interesting.\n']
Notice that each line has a newline character at the end, \n
. A file is just a
long string of characters! The only thing that indicates a new line is the
newline character.
We can write a shorter version of get_lines()
as follows:
def get_lines(filename):
with open(filename) as file:
return list(file)
Because a file is iterable (we can use for ... in
to loop over the lines), the
list()
function will, in one step, collect all of the lines into a list.
Split
We can take any string and split it into a list. For example:
long_string = "this is a string containing several words"
result = long_string.split()
print(result)
This will print:
['this', 'is', 'a', 'string', 'containing', 'several', 'words']
You can do this when reading a file, because the lines in a file are just a
string. For example, imagine we have a file called input.txt
that contains the
following:
5 6 3
2 7 10
5 2 1
We can split and then print these lines like this:
def split_lines(input_file):
with open(input_file) as infile:
for line in infile:
words = line.split()
print(words)
split_lines('input.txt')
This will print:
['5', '6', '3']
['2', '7', '10']
['5', '2', '1']
Converting to integers
When we read numbers from a file, they are read as strings. See the above
example, showing that each number, such as 5
, is a string.
If we want to calculate something with these numbers, we need to convert them to
integers first. We can do this with the int()
function. For example:
a = '5'
num = int(a)
print(num + 5)
This will print:
10
We cannot do this without first converting a
to an integer using int()
. If
we tried to add 5 to a
, this would result in an error because we can’t add an
integer to a string.
We can do this with an entire file of numbers as well. Using our same
input.txt
from above, we can add 2 to all of the numbers in the file:
def get_lines(filename):
with open(filename) as file:
return list(file)
def add_2_and_print(line):
words = line.split()
for word in words:
print(int(word) + 2)
def increment_by_2(input_file):
lines = get_lines(input_file)
for line in lines:
add_2_and_print(line)
increment_by_2('input.txt')
This will print:
7
8
5
4
9
12
7
4
3
Converting to floats
We can likewise convert a string to a float:
a = '5.3'
num = float(a)
print(num + 5)
This will print:
10.3
Example
Here is an example that covers many of these concepts. We want to compute the
mean of every line in a file. We have a data file called data.txt
that
contains:
1 2 3 4 5
4 5 6 7 8
9 10 11 12 13
We need to read each line, compute the mean, and then print it out. Here is one solution:
def get_lines(filename):
"""Returns a list of the lines in the file."""
with open(filename) as file:
return list(file)
def get_numbers(line):
"""Parses out the integers in `line` into a list.
"""
tokens = line.split()
numbers = []
for token in tokens:
numbers.append(int(token))
return numbers
def average(numbers):
"""Computes the average of a list of numbers
"""
total = 0
for number in numbers:
total = total + number
return total / len(numbers)
def print_means(filename):
"""Prints the average value for each line in the file."""
for line in get_lines(filename):
numbers = get_numbers(line)
ave = average(numbers)
print(ave)
filename = 'data.txt'
print_means(filename)
If you look at just the print_means()
function, you can see the entire logic
of the program:
- loop through all of the lines in a file
- for each line
- convert the line into a list of numbers
- compute the average of that list of numbers
- print the average
As you decompose problems, start with the big picture, like this. You should
already have a get_lines()
function. Then write and test the first function,
get_numbers()
Once that is working, write and test the average()
function.