Fun with dictionaries

We are going to explore dictionaries some more…

Grouping by first and last letter

Here is some code that groups words by their first letter:

groups = {}

response = input('Word: ')
while response:

    key = response[0]

    if key not in groups:
        groups[key] = []

    groups[key].append(response)

    response = input('Word: ')

print(groups)

Put this into PyCharm and run it.

come back when you are done

Here is some code that groups words by their last letter:

groups = {}

response = input('Word: ')
while response:

    key = response[-1]

    if key not in groups:
        groups[key] = []

    groups[key].append(response)

    response = input('Word: ')

print(groups)

Put this into PyCharm and run it.

come back when you are done

Now … how would you change the code so that it groups words by that start with the same first letter and last letter? In other words:

apple
ape

should be grouped together, because they start with the same first and last letter. Likewise,

misty
monkey

should be grouped together, because they likewise start with the same first and last letter.

work on this with a friend

You should have come up with this:

groups = {}

response = input('Word: ')
while response:

    key = (response[0], response[-1])

    if key not in groups:
        groups[key] = []

    groups[key].append(response)

    response = input('Word: ')

print(groups)

Notice how we are using a key that is a tuple! We saw earlier that keys can be strings and integers. You can also use tuples as keys.

When creating a dictionary, always think about what your items have in common. That should be the key for the dictionary.

If you run this code and enter a set of words:

word
weird
apple
ape
apex
ax
ant
attic
intense
ice
incense

Then you should get:

{('w', 'd'): ['word', 'weird'], ('a', 'e'): ['apple', 'ape'], ('a', 'x'): ['apex', 'ax'], ('a', 't'): ['ant'], ('a', 'c'): ['attic'], ('i', 'e'): ['intense', 'ice', 'incense']}

1 Nephi

Given the text of 1 Nephi, group words by preceding word. Then use the result to randomly generate text.

Grouping

Let’s work on the first piece of this — group a set of words by the previous word.

Questions:

How do we split a long string of text into words?
If we have a word, how do we get rid of any punctuation that might be at the end?
How do we keep track of the previous word?
What is the key for the dictionary?

work on this with a friend

OK, here is some code for this problem:

def group_by_previous(text):
    # we are given a long string
    # split it into words
    words = text.split()
    # keep track of the previous word
    prev = None
    groups = {}
    for word in words:
        # strip off any punctuation
        word = word.strip('.,;!?')

        # if prev is None this means we are at the first word
        if prev is None:
            # this becomes the previous word and then we have to continue the
            # loop because we can't do anything with the first word
            prev = word
            continue

        # convert the word into the key for the dictionary
        key = prev.lower()
        # if this key is not in the dictionary, initialize an empty list for this key
        if key not in groups:
            groups[key] = []

        # append the word to the list for this key
        groups[key].append(word)

        # be sure to set previous to this word so the NEXT time through the loop, we have
        # the right previous word
        prev = word
    return groups

Notice that we can strip off all the punctuation with word.strip('.,;!?').

The other important concept is keeping track of the previous word each time through the loop. Pay attention to how the above code uses prev.

If we run this code:

passage = """
And it came to pass that I Nephi said unto my father
I will go and do the things which the Lord has commanded
for I know that the Lord giveth no commandments unto the
children of men save He shall prepare a way that they may
accomplish the things that He commandeth them
"""

groups = group_by_previous(passage)
print(groups)

then we get:

{'and': ['it', 'do'], 'it': ['came'], 'came': ['to'], 'to': ['pass'], 'pass': ['that'],
'that': ['I', 'the', 'they', 'He'], 'i': ['Nephi', 'will', 'know'], 'nephi': ['said'],
'said': ['unto'], 'unto': ['my', 'the'], 'my': ['father'], 'father': ['I'], 'will': ['go'],
'go': ['and'], 'do': ['the'], 'the': ['things', 'Lord', 'Lord', 'children', 'things'],
'things': ['which', 'that'], 'which': ['the'], 'lord': ['has', 'giveth'], 'has': ['commanded'],
'commanded': ['for'], 'for': ['I'], 'know': ['that'], 'giveth': ['no'], 'no': ['commandments'],
'commandments': ['unto'], 'children': ['of'], 'of': ['men'], 'men': ['save'], 'save': ['He'],
'he': ['shall', 'commandeth'], 'shall': ['prepare'], 'prepare': ['a'], 'a': ['way'],
'way': ['that'], 'they': ['may'], 'may': ['accomplish'], 'accomplish': ['the'],
'commandeth': ['them']}

Randomly generating text

Now we can use this to randomly generate some text:

import random

word = 'and'
words = [word]

for _ in range(20):
    word = random.choice(groups[word.lower()])
    words.append(word)

print(" ".join(words))

Put this into PyCharm and run it.

come back when you are done

You should see something like:

and it came to pass that they may accomplish the Lord giveth no commandments unto the things that the Lord has

Sometimes you will get an error:

    word = random.choice(groups[word.lower()])
KeyError: 'them'

Why do you think this is happening?

Our passage, above, ends with them. So the dictionary contains commandeth -> them. If we pick commandeth as our random word, then we will always choose them as the next word. The next time through the loop, we use them as our key, but there is no entry for them in the dictionary because we never saw any word that comes after them.