Counting and Grouping

Dictionaries make it easy to count and group items. Before we show this, let’s review some dictionary basics.

Adding items to a dictionary

We can start with an empty dictionary by using curly braces:

meals = {}

Then we can add items one at a time:

meals['breakfast'] = 'pancakes'
meals['lunch'] = 'ramen'

If we print the dictionary:

print(meals)

we get:

{'breakfast': 'pancakes', 'lunch': 'ramen'}

We could have made this easier by just initializing the dictionary with what we wanted:

meals = {'breakfast': 'pancakes', 'lunch': 'ramen'}

But at times we are going to want to add items to a dictionary one at a time.

Remember, that if you re-use a key:

meals['breakfast'] = 'cereal'

then this will overwrite the dictionary. If we printed it again, we would get:

{'breakfast': 'cereal', 'lunch': 'ramen'}

Counting

Let’s write a function called count_letters() to count how many times each letter appears in a string. So if we have this text:

I ate a banana on my way to work.

we would like to know how many a’s, b’s and so forth are in the string. To do this, we want to keep a dictionary where the keys are letters and the values are a count. For example:

counts = {'o': 2, 't': 1}

Here is some code to do this:

def count_letters(text):
    # create an empty dictionary
    counts = {}
    for letter in text:
        # if letter is not alphabetical, skip it
        if not letter.isalpha():
            continue
        # convert to lowercase
        letter = letter.lower()
        # if we have not found this letter yet, then create an key/value pair with value = 0
        if letter not in counts:
            counts[letter] = 0
        # increment the count by 1
        counts[letter] += 1
    # return the dictionary
    return counts

This code use a common pattern — if we don’t yet have a key in the dictionary, create a key and initialize its value to zero.

If we run this code:

counts = count_letters('I ate a banana on my way to work.')
print(counts)

we get:

{'i': 1, 'a': 6, 't': 2, 'e': 1, 'b': 1, 'n': 3, 'o': 3, 'm': 1, 'y': 2, 'w': 2, 'r': 1, 'k': 1}

Grouping

Another common task with a dictionary is to group items. For example, if you are given a string, you may want to group all of the words by their first letter. So if we have:

apple banana anchovy caramel berry candy

We would like to group all of the words start with ‘a’, all the words starting with ‘b’, and so forth. To do this, we need a dictionary that maps letters to a list of words:

{'a': ['apple', 'anchovy'], 'b': ['banana', 'berry'], 'c': ['caramel', 'candy']}

Here is some code to do this:

def group_by_first_letter(words):
    # initialize an empty dictionary
    groups = {}
    # loop through the words
    for word in words:
        # get the first letter of the word
        key = word[0]
        # if we haven't seen this letter before, initialize
        # the dictionary with an empty list
        if key not in groups:
            groups[key] = []
        # append this word to the list for this key
        groups[key].append(word)

    # return the dictionary
    return groups

Notice that this isn’t very different from counting letters in a string. Instead of having a dictionary of (letter, count) we have a dictionary that contains (letter, list of words).

If we run this:

print(group_by_first_letter('apple banana anchovy caramel berry candy'.split()))

then we get:

{'a': ['apple', 'anchovy'], 'b': ['banana', 'berry'], 'c': ['caramel', 'candy']}

Grouping by size

Try this as an exercise.

Can you write a function group_by_length(words) that groups all of the words in a list based on their size? You should return a dictionary that has (length, list of words) entries.

work on this with a friend

Here is one solution for this function:

def group_by_length(words):
    # initialize a dictionary
    groups = {}
    # loop through the words
    for word in words:
        # the key is the length of the word
        key = len(word)
        # if we haven't seen this length yet, initalize an empty list
        if key not in groups:
            groups[key] = []
        # append the word to the list
        groups[key].append(word)
    return groups

If we run this code:

text = 'Utah Idaho Oregon California Washington Arizona Iowa Ohio Mississippi Florida Kansas Maine'
states = text.split()
print(group_by_length(states))

we get:

{4: ['Utah', 'Iowa', 'Ohio'], 5: ['Idaho', 'Maine'], 6: ['Oregon', 'Kansas'], 10: ['California', 'Washington'], 7: ['Arizona', 'Florida'], 11: ['Mississippi']}

A pattern to use when writing code that groups items

When we group items together, we can often follow this pattern:

loop through a list of items
compute the key you will use to group items
- this must be something they have in common, like starting with the same first letter
check if the key is present, and if not, add a new key to the dictionary
- initialize the entry with zero or an empty list
add the item to its group
- increment the counter or append to the list