BYU logo Computer Science

Split and join

In many tasks it is helpful to find all the words in a string. The split() function does this by converting a string to a list of words. The join() function goes in the opposite direction, converting a list of words into a string.

Split

The default use of split() is to convert a string into words that are separated by spaces:

question = 'Do you know the muffin man?'
print(question.split())

This will print:

['Do', 'you', 'know', 'the', 'muffin', 'man?']

By providing an argument to split(), called the delimiter, you can split a string based on any character:

statement = "It is possible that I might, under other circumstances, have something to say, but right now I don't"
print(statement.split(','))

This will print:

['It is possible that I might', ' under other circumstances', ' have something to say', " but right now I don't"]

Join

Join takes a list of strings and joins them into a single string, using whichever delimiter you specify. Taking our first example, let’s put those words back together into a string:

words = ['Do', 'you', 'know', 'the', 'muffin', 'man?']
sentence = ' '.join(words)
print(sentence)

This looks a little strange, but what we are saying is that we want to use a space as a delimiter. This is the quotes with a space between them: ' '. We can then call the join() function on this string, meaning take all the strings in the words variable and turn them into one long string, with each word separated by a space.

This will print:

Do you know the muffin man?

We could instead use '-' as a delimiter:

words = ['Do', 'you', 'know', 'the', 'muffin', 'man?']
sentence = '-'.join(words)
print(sentence)

and this will print:

Do-you-know-the-muffin-man?

Similarly:

result = ' and '.join(['apples','oranges','bananas','pears'])
print(result)

will print:

apples and oranges and bananas and pears

Strip

When we read lines from a file, they will be terminated by a newline character. If we are splitting based on whitespace, this doesn’t matter:

question = 'Do you know the muffin man?\n'
print(question.split())

This will print:

['Do', 'you', 'know', 'the', 'muffin', 'man?']

However, if we are splitting by something else, this matters a lot! For example, imagine we are reading a file that has a bunch of comma-separated scores:

line = '90,95,88\n'
scores = line.split(',')
print(scores)

Then this will print:

['90', '95', '88\n']

In these cases, you want to use split()! This function removes all of the trailing whitespace on a string:

line = '90,95,88\n'
line_without_newline = line.strip()
scores = line_without_newline.split(',')
print(scores)

This will print:

['90', '95', '88']

Normally we would write this code in a shorter form:

line = '90,95,88\n'
scores = line.strip().split(',')
print(scores)

What happens we call two functions back-to-back like this? First strip() takes the line and removes the trailing whitespace. When this is done, then split() takes the result and splits this into words based on the comma. The result of that is stored in the result variable.