Reading Files

Imagine you have a file that contains the following text:

This is an example file.
It has several lines of text.
Nothing terribly interesting.

You can read and print every line in this file using the following code:

file = open('example.txt')

for line in file:
    print(line)

file.close()

When you call open('example.txt'), this will open a file in the same directory as the script you are writing. The open() function returns a file object, which we store in the file variable.

After you open the file, you can iterate over all of its lines using:

for line in file

You then have to remember to close the file, with file.close().

A better way to open files

A better way to open files uses the with keyword:

with open('example.txt') as file:
    for line in file:
        print(line)

This code does the same thing! It opens the file in example.txt, sets the file variable to be a file object that can read this file, and then uses for line in file to iterate over the lines in the file.

An important feature of this syntax is that the file is automatically closed when you finish the with block. We will use this syntax in class because having the file be automatically closed is helpful.

Iteration

We previously saw iteration with lists. Remember:

numbers = [2, 23, 17, 75]
for number in numbers:
    print(number*2)

Iteration over the lines of a file uses the same for ... in statement:

with open('example.txt') as file:
    for line in file:
        print(line)

Getting all the lines in a file

This is a useful function that we will reuse throughout the course:

def get_lines(filename):
    with open(filename) as file:
        lines = []
        for line in file:
            lines.append(line)
        return lines

This function collects all the lines in the file into a list and then returns that list. If you call it:

print(get_lines('example.txt'))

You will see this output:

['This is an example file.\n', 'It has several lines of text.\n', 'Nothing terribly interesting.\n']

Notice that each line has a newline character at the end, \n. A file is just a long string of characters! The only thing that indicates a new line is the newline character.

We can write a shorter version of get_lines() as follows:

def get_lines(filename):
    with open(filename) as file:
        return list(file)

Because a file is iterable (we can use for ... in to loop over the lines), the list() function will, in one step, collect all of the lines into a list.

Split

We can take any string and split it into a list. For example:

long_string = "this is a string containing several words"
result = long_string.split()
print(result)

This will print:

['this', 'is', 'a', 'string', 'containing', 'several', 'words']

You can do this when reading a file, because the lines in a file are just a string. For example, imagine we have a file called input.txt that contains the following:

5 6 3
2 7 10
5 2 1

We can split and then print these lines like this:

def split_lines(input_file):
    with open(input_file) as infile:
        for line in infile:
            words = line.split()
            print(words)


split_lines('input.txt')

This will print:

['5', '6', '3']
['2', '7', '10']
['5', '2', '1']

Converting to integers

When we read numbers from a file, they are read as strings. See the above example, showing that each number, such as 5, is a string.

If we want to calculate something with these numbers, we need to convert them to integers first. We can do this with the int() function. For example:

a = '5'
num = int(a)
print(num + 5)

This will print:

We cannot do this without first converting a to an integer using int(). If we tried to add 5 to a, this would result in an error because we can’t add an integer to a string.

We can do this with an entire file of numbers as well. Using our same input.txt from above, we can add 2 to all of the numbers in the file:

def get_lines(filename):
    with open(filename) as file:
        return list(file)


def add_2_and_print(line):
    words = line.split()
    for word in words:
        print(int(word) + 2)


def increment_by_2(input_file):
    lines = get_lines(input_file)
    for line in lines:
        add_2_and_print(line)


increment_by_2('input.txt')

This will print:

Converting to floats

We can likewise convert a string to a float:

a = '5.3'
num = float(a)
print(num + 5)

This will print:

10.3

Example

Here is an example that covers many of these concepts. We want to compute the mean of every line in a file. We have a data file called data.txt that contains:

1 2 3 4 5
4 5 6 7 8
9 10 11 12 13

We need to read each line, compute the mean, and then print it out. Here is one solution:

def get_lines(filename):
    """Returns a list of the lines in the file."""
    with open(filename) as file:
        return list(file)

def get_numbers(line):
    """Parses out the integers in `line` into a list.
    """
    tokens = line.split()
    numbers = []
    for token in tokens:
        numbers.append(int(token))
    return numbers

def average(numbers):
    """Computes the average of a list of numbers
    """
    total = 0
    for number in numbers:
        total = total + number
    return total / len(numbers)

def print_means(filename):
    """Prints the average value for each line in the file."""
    for line in get_lines(filename):
        numbers = get_numbers(line)
        ave = average(numbers)
        print(ave)

filename = 'data.txt'
print_means(filename)

If you look at just the print_means() function, you can see the entire logic of the program:

loop through all of the lines in a file
for each line
- convert the line into a list of numbers
- compute the average of that list of numbers
- print the average

As you decompose problems, start with the big picture, like this. You should already have a get_lines() function. Then write and test the first function, get_numbers() Once that is working, write and test the average() function.