Counting and Grouping
Dictionaries make it easy to count and group items. Before we show this, let’s review some dictionary basics.
Adding items to a dictionary
We can start with an empty dictionary by using curly braces:
meals = {}
Then we can add items one at a time:
meals['breakfast'] = 'pancakes'
meals['lunch'] = 'ramen'
If we print the dictionary:
print(meals)
we get:
{'breakfast': 'pancakes', 'lunch': 'ramen'}
We could have made this easier by just initializing the dictionary with what we wanted:
meals = {'breakfast': 'pancakes', 'lunch': 'ramen'}
But at times we are going to want to add items to a dictionary one at a time.
Remember, that if you re-use a key:
meals['breakfast'] = 'cereal'
then this will overwrite the dictionary. If we printed it again, we would get:
{'breakfast': 'cereal', 'lunch': 'ramen'}
Counting
Let’s write a function called count_letters()
to count how many times each
letter appears in a string. So if we have this text:
I ate a banana on my way to work.
we would like to know how many a’s, b’s and so forth are in the string. To do this, we want to keep a dictionary where the keys are letters and the values are a count. For example:
counts = {'o': 2, 't': 1}
Here is some code to do this:
def count_letters(text):
# create an empty dictionary
counts = {}
for letter in text:
# if letter is not alphabetical, skip it
if not letter.isalpha():
continue
# convert to lowercase
letter = letter.lower()
# if we have not found this letter yet, then create an key/value pair with value = 0
if letter not in counts:
counts[letter] = 0
# increment the count by 1
counts[letter] += 1
# return the dictionary
return counts
This code use a common pattern — if we don’t yet have a key in the dictionary, create a key and initialize its value to zero.
If we run this code:
counts = count_letters('I ate a banana on my way to work.')
print(counts)
we get:
{'i': 1, 'a': 6, 't': 2, 'e': 1, 'b': 1, 'n': 3, 'o': 3, 'm': 1, 'y': 2, 'w': 2, 'r': 1, 'k': 1}
Grouping
Another common task with a dictionary is to group items. For example, if you are given a string, you may want to group all of the words by their first letter. So if we have:
apple banana anchovy caramel berry candy
We would like to group all of the words start with ‘a’, all the words starting with ‘b’, and so forth. To do this, we need a dictionary that maps letters to a list of words:
{'a': ['apple', 'anchovy'], 'b': ['banana', 'berry'], 'c': ['caramel', 'candy']}
Here is some code to do this:
def group_by_first_letter(words):
# initialize an empty dictionary
groups = {}
# loop through the words
for word in words:
# get the first letter of the word
key = word[0]
# if we haven't seen this letter before, initialize
# the dictionary with an empty list
if key not in groups:
groups[key] = []
# append this word to the list for this key
groups[key].append(word)
# return the dictionary
return groups
Notice that this isn’t very different from counting letters in a string. Instead
of having a dictionary of (letter, count)
we have a dictionary that contains
(letter, list of words)
.
If we run this:
print(group_by_first_letter('apple banana anchovy caramel berry candy'.split()))
then we get:
{'a': ['apple', 'anchovy'], 'b': ['banana', 'berry'], 'c': ['caramel', 'candy']}
Grouping by size
Try this as an exercise.
Can you write a function group_by_length(words)
that groups all of the words
in a list based on their size? You should return a dictionary that has
(length, list of words)
entries.
Here is one solution for this function:
def group_by_length(words):
# initialize a dictionary
groups = {}
# loop through the words
for word in words:
# the key is the length of the word
key = len(word)
# if we haven't seen this length yet, initalize an empty list
if key not in groups:
groups[key] = []
# append the word to the list
groups[key].append(word)
return groups
If we run this code:
text = 'Utah Idaho Oregon California Washington Arizona Iowa Ohio Mississippi Florida Kansas Maine'
states = text.split()
print(group_by_length(states))
we get:
{4: ['Utah', 'Iowa', 'Ohio'], 5: ['Idaho', 'Maine'], 6: ['Oregon', 'Kansas'], 10: ['California', 'Washington'], 7: ['Arizona', 'Florida'], 11: ['Mississippi']}
A pattern to use when writing code that groups items
When we group items together, we can often follow this pattern:
- loop through a list of items
- compute the key you will use to group items
- this must be something they have in common, like starting with the same first letter
- check if the key is present, and if not, add a new key to the dictionary
- initialize the entry with zero or an empty list
- add the item to its group
- increment the counter or append to the list