Split and join
In many tasks it is helpful to find all the words in a string. The split()
function does this by converting a string to a list of words. The join()
function goes in the opposite direction, converting a list of words into a
string.
Split
The default use of split()
is to convert a string into words that are
separated by spaces:
question = 'Do you know the muffin man?'
print(question.split())
This will print:
['Do', 'you', 'know', 'the', 'muffin', 'man?']
By providing an argument to split()
, called the delimiter
, you can split a
string based on any character:
statement = "It is possible that I might, under other circumstances, have something to say, but right now I don't"
print(statement.split(','))
This will print:
['It is possible that I might', ' under other circumstances', ' have something to say', " but right now I don't"]
Join
Join takes a list of strings and joins them into a single string, using whichever delimiter you specify. Taking our first example, let’s put those words back together into a string:
words = ['Do', 'you', 'know', 'the', 'muffin', 'man?']
sentence = ' '.join(words)
print(sentence)
This looks a little strange, but what we are saying is that we want to use a
space as a delimiter. This is the quotes with a space between them: ' '
. We
can then call the join()
function on this string, meaning take all the strings
in the words
variable and turn them into one long string, with each word
separated by a space.
This will print:
Do you know the muffin man?
We could instead use '-'
as a delimiter:
words = ['Do', 'you', 'know', 'the', 'muffin', 'man?']
sentence = '-'.join(words)
print(sentence)
and this will print:
Do-you-know-the-muffin-man?
Similarly:
result = ' and '.join(['apples','oranges','bananas','pears'])
print(result)
will print:
apples and oranges and bananas and pears
Strip
When we read lines from a file, they will be terminated by a newline character. If we are splitting based on whitespace, this doesn’t matter:
question = 'Do you know the muffin man?\n'
print(question.split())
This will print:
['Do', 'you', 'know', 'the', 'muffin', 'man?']
However, if we are splitting by something else, this matters a lot! For example, imagine we are reading a file that has a bunch of comma-separated scores:
line = '90,95,88\n'
scores = line.split(',')
print(scores)
Then this will print:
['90', '95', '88\n']
In these cases, you want to use split()
! This function removes all of the
trailing whitespace on a string:
line = '90,95,88\n'
line_without_newline = line.strip()
scores = line_without_newline.split(',')
print(scores)
This will print:
['90', '95', '88']
Normally we would write this code in a shorter form:
line = '90,95,88\n'
scores = line.strip().split(',')
print(scores)
What happens we call two functions back-to-back like this? First strip()
takes
the line and removes the trailing whitespace. When this is done, then split()
takes the result and splits this into words based on the comma. The result of
that is stored in the result
variable.