Fun with dictionaries
We are going to explore dictionaries some more…
Grouping by first and last letter
Here is some code that groups words by their first letter:
groups = {}
response = input('Word: ')
while response:
key = response[0]
if key not in groups:
groups[key] = []
groups[key].append(response)
response = input('Word: ')
print(groups)
Put this into PyCharm and run it.
Here is some code that groups words by their last letter:
groups = {}
response = input('Word: ')
while response:
key = response[-1]
if key not in groups:
groups[key] = []
groups[key].append(response)
response = input('Word: ')
print(groups)
Put this into PyCharm and run it.
Now … how would you change the code so that it groups words by that start with the same first letter and last letter? In other words:
- apple
- ape
should be grouped together, because they start with the same first and last letter. Likewise,
- misty
- monkey
should be grouped together, because they likewise start with the same first and last letter.
You should have come up with this:
groups = {}
response = input('Word: ')
while response:
key = (response[0], response[-1])
if key not in groups:
groups[key] = []
groups[key].append(response)
response = input('Word: ')
print(groups)
Notice how we are using a key that is a tuple! We saw earlier that keys can be strings and integers. You can also use tuples as keys.
When creating a dictionary, always think about what your items have in common. That should be the key for the dictionary.
If you run this code and enter a set of words:
- word
- weird
- apple
- ape
- apex
- ax
- ant
- attic
- intense
- ice
- incense
Then you should get:
{('w', 'd'): ['word', 'weird'], ('a', 'e'): ['apple', 'ape'], ('a', 'x'): ['apex', 'ax'], ('a', 't'): ['ant'], ('a', 'c'): ['attic'], ('i', 'e'): ['intense', 'ice', 'incense']}
1 Nephi
Given the text of 1 Nephi, group words by preceding word. Then use the result to randomly generate text.
Grouping
Let’s work on the first piece of this — group a set of words by the previous word.
Questions:
- How do we split a long string of text into words?
- If we have a word, how do we get rid of any punctuation that might be at the end?
- How do we keep track of the previous word?
- What is the key for the dictionary?
OK, here is some code for this problem:
def group_by_previous(text):
# we are given a long string
# split it into words
words = text.split()
# keep track of the previous word
prev = None
groups = {}
for word in words:
# strip off any punctuation
word = word.strip('.,;!?')
# if prev is None this means we are at the first word
if prev is None:
# this becomes the previous word and then we have to continue the
# loop because we can't do anything with the first word
prev = word
continue
# convert the word into the key for the dictionary
key = prev.lower()
# if this key is not in the dictionary, initialize an empty list for this key
if key not in groups:
groups[key] = []
# append the word to the list for this key
groups[key].append(word)
# be sure to set previous to this word so the NEXT time through the loop, we have
# the right previous word
prev = word
return groups
Notice that we can strip off all the punctuation with word.strip('.,;!?')
.
The other important concept is keeping track of the previous word each time
through the loop. Pay attention to how the above code uses prev
.
If we run this code:
passage = """
And it came to pass that I Nephi said unto my father
I will go and do the things which the Lord has commanded
for I know that the Lord giveth no commandments unto the
children of men save He shall prepare a way that they may
accomplish the things that He commandeth them
"""
groups = group_by_previous(passage)
print(groups)
then we get:
{'and': ['it', 'do'], 'it': ['came'], 'came': ['to'], 'to': ['pass'], 'pass': ['that'],
'that': ['I', 'the', 'they', 'He'], 'i': ['Nephi', 'will', 'know'], 'nephi': ['said'],
'said': ['unto'], 'unto': ['my', 'the'], 'my': ['father'], 'father': ['I'], 'will': ['go'],
'go': ['and'], 'do': ['the'], 'the': ['things', 'Lord', 'Lord', 'children', 'things'],
'things': ['which', 'that'], 'which': ['the'], 'lord': ['has', 'giveth'], 'has': ['commanded'],
'commanded': ['for'], 'for': ['I'], 'know': ['that'], 'giveth': ['no'], 'no': ['commandments'],
'commandments': ['unto'], 'children': ['of'], 'of': ['men'], 'men': ['save'], 'save': ['He'],
'he': ['shall', 'commandeth'], 'shall': ['prepare'], 'prepare': ['a'], 'a': ['way'],
'way': ['that'], 'they': ['may'], 'may': ['accomplish'], 'accomplish': ['the'],
'commandeth': ['them']}
Randomly generating text
Now we can use this to randomly generate some text:
import random
word = 'and'
words = [word]
for _ in range(20):
word = random.choice(groups[word.lower()])
words.append(word)
print(" ".join(words))
Put this into PyCharm and run it.
You should see something like:
and it came to pass that they may accomplish the Lord giveth no commandments unto the things that the Lord has
Sometimes you will get an error:
word = random.choice(groups[word.lower()])
KeyError: 'them'
Why do you think this is happening?
Our passage, above, ends with them
. So the dictionary contains
commandeth -> them
. If we pick commandeth
as our random word, then we will
always choose them
as the next word. The next time through the loop, we use
them
as our key, but there is no entry for them
in the dictionary because we
never saw any word that comes after them
.