Structured data and JSON
Imagine you are writing an application that keeps track of student data. You might store information about each student in a dictionary:
Notice that each dictionary is structured identically. They each contain:
- first name
- surname
- age
- major
This is structured data
This means we could think of a generic “student” dictionary that has these keys. Then we could construct a list of students, each one represented by a dictionary:
students = [
{'firstname': 'Juan', 'surname': 'Lopez', 'age': 18, 'major': 'Linguistics'},
{'firstname': 'Ulysses', 'surname': 'Bennion', 'age': 25, 'major': 'Mechanical Engineering'},
{'firstname': 'Sarah', 'surname': 'Grover', 'age': 19, 'major': 'Mathematics'},
{'firstname': 'Mary', 'surname': 'Han', 'age': 20, 'major': 'Nursing'},
{'firstname': 'Jacob', 'surname': 'Smith', 'age': 18, 'major': 'Open Major'}
]
Now that we a list of students is a list of dictionaries, we can compute things on this data.
We can compute the oldest student
def get_oldest(students):
oldest = None
for student in students:
if oldest is None or student['age'] > oldest['age']:
oldest = student
return oldest
oldest = get_oldest(students)
print(oldest)
This prints:
{'firstname': 'Ulysses', 'surname': 'Bennion', 'age': 25, 'major': 'Mechanical Engineering'}
We can check whether we have any math majors
# Do we have any math majors?
def has_math_major(students):
for student in students:
if student['major'] == 'Mathematics':
return True
return False
print(has_math_major(students))
This prints:
True
Data abstraction
What we are discovering here is the concept of data abstraction. A “student” is a collection of information about that student. We could imagine putting lots more data into a student record. For example:
- department
- class standing
- home town
- current residence
- student ID
- gender
- nationality
- languages spoken
But we really only need whatever our program needs.
When creating a data abstraction, the point is not to define all the possible properties that might apply in real life, but to define the set of properties needed by your program.
We can call the definition of what goes into a student a type definition, schema, or shape.
A quiz
What properties does a student need to have for this code to work?
def print_eligible_students(students):
# Students must be part of the Physics major and be at least 21 years old
for student in students:
if student['major'] == 'Physics' and student['age'] >= 21:
print(f"{student['last']}, {student['first']} ({student['standing']})")
By examining this code, you should be able to identify that the student dictionary has the following keys:
- major (string)
- age (numeric)
- last
- first
- standing
It could have other keys. But this is the set of keys it needs for the above
code to work. If some of these properties are missing, you will get a
KeyError
.
JSON
Imagine you want to take all of the data from a Python dictionary and send it to your friend. Or maybe you want to export it to a file, so that you can read it into a different program (maybe one that tracks alumni).
JSON is the most commonly-used method for transferring data between programs.
Writing a JSON file
Here is how we can convert a Python dictionary into a JSON file:
import json
students = [
{'firstname': 'Juan', 'surname': 'Lopez', 'age': 18, 'major': 'Linguistics'},
{'firstname': 'Ulysses', 'surname': 'Bennion', 'age': 25, 'major': 'Mechanical Engineering'},
{'firstname': 'Sarah', 'surname': 'Grover', 'age': 19, 'major': 'Mathematics'},
{'firstname': 'Mary', 'surname': 'Han', 'age': 20, 'major': 'Nursing'},
{'firstname': 'Jacob', 'surname': 'Smith', 'age': 18, 'major': 'Open Major'}
]
with open('students.json', 'w') as file:
json.dump(students, file)
Notice that we need to
import json
— import the json libraryjson.dumps
— dump a dictionary to a file
If you run this code, it creates a file called students.json
that contains:
[{"firstname": "Juan", "surname": "Lopez", "age": 18, "major": "Linguistics"},
{"firstname": "Ulysses", "surname": "Bennion", "age": 25, "major": "Mechanical Engineering"},
{"firstname": "Sarah", "surname": "Grover", "age": 19, "major": "Mathematics"},
{"firstname": "Mary", "surname": "Han", "age": 20, "major": "Nursing"},
{"firstname": "Jacob", "surname": "Smith", "age": 18, "major": "Open Major"}]
We have formatted this with line breaks so you can see it better, but in the file, it is just one long string.
If you would like a file that is easier to read, you can use the keyword
argument of ident=2
:
import json
students = [
{'firstname': 'Juan', 'surname': 'Lopez', 'age': 18, 'major': 'Linguistics'},
{'firstname': 'Ulysses', 'surname': 'Bennion', 'age': 25, 'major': 'Mechanical Engineering'},
{'firstname': 'Sarah', 'surname': 'Grover', 'age': 19, 'major': 'Mathematics'},
{'firstname': 'Mary', 'surname': 'Han', 'age': 20, 'major': 'Nursing'},
{'firstname': 'Jacob', 'surname': 'Smith', 'age': 18, 'major': 'Open Major'}
]
with open('students.json', 'w') as file:
json.dump(students, file, indent=2)
This will now create students.json
as shown:
[
{
"firstname": "Juan",
"surname": "Lopez",
"age": 18,
"major": "Linguistics"
},
{
"firstname": "Ulysses",
"surname": "Bennion",
"age": 25,
"major": "Mechanical Engineering"
},
{
"firstname": "Sarah",
"surname": "Grover",
"age": 19,
"major": "Mathematics"
},
{
"firstname": "Mary",
"surname": "Han",
"age": 20,
"major": "Nursing"
},
{
"firstname": "Jacob",
"surname": "Smith",
"age": 18,
"major": "Open Major"
}
]
Notice how this pretty much looks exactly like a Python dictionary. :-)
Reading a JSON file
You can likewise load a JSON file in Python:
import json
with open('students.json') as file:
student_info = json.load(file)
print(student_info)
This will print:
[{'firstname': 'Juan', 'surname': 'Lopez', 'age': 18, 'major': 'Linguistics'},
{'firstname': 'Ulysses', 'surname': 'Bennion', 'age': 25, 'major': 'Mechanical Engineering'},
{'firstname': 'Sarah', 'surname': 'Grover', 'age': 19, 'major': 'Mathematics'},
{'firstname': 'Mary', 'surname': 'Han', 'age': 20, 'major': 'Nursing'},
{'firstname': 'Jacob', 'surname': 'Smith', 'age': 18, 'major': 'Open Major'}]
You could print the first student with:
print(student_info[0])
This will print:
{'firstname': 'Juan', 'surname': 'Lopez', 'age': 18, 'major': 'Linguistics'}
Example: Pokemon
You are given a large file, pokedex.json, which is a bunch of information on Pokemon. It has the following schema:
- id (integer)
- name (dictionary)
- english
- japanese
- chinese
- french
- type (list of strings)
- base (dictionary)
- HP (integer)
- Attack (integer)
- Defense (integer)
- Sp. Attack (integer)
- Sp. Defense (integer)
- Speed (integer)
Here is an example:
{
"id": 242,
"name": {
"english": "Blissey",
"japanese": "ハピナス",
"chinese": "幸福蛋",
"french": "Leuphorie"
},
"type": ["Normal"],
"base": {
"HP": 255,
"Attack": 10,
"Defense": 10,
"Sp. Attack": 75,
"Sp. Defense": 135,
"Speed": 55
}
}
Largest HP
Let’s write a function to find the Pokemon with the largest HP.
- If
pokemon
is a variable holding the dictionary for a single Pokemon, what is the expression to find the HP of that Pokemon?
You can use pokemon['base']['HP']
to get the HP of a pokemon.
Here is the code for this function:
import json
def find_largest_hp(pokedex):
# keep track of the largest
largest_hp = None
# loop through all of the pokemon
for pokemon in pokedex:
# check if this is the largest
if largest_hp is None or pokemon['base']['HP'] > largest_hp['base']['HP']:
# store the largest we have seen so far
largest_hp = pokemon
return largest_hp
# open a JSON file with all the pokemon
with open('pokedex.json') as file:
# load the file into a dictionary
pokedex = json.load(file)
largest_hp = find_largest_hp(pokedex)
print(largest_hp)
Notice that we can use for ... in
to loop through all the pokemon in the
pokedex. This will go through them in whatever order they were initially added.
Be sure to download pokedex.json and then
you can run this code.
Your code should print:
{'id': 242, 'name': {'english': 'Blissey', 'japanese': 'ハピナス', 'chinese': '幸福蛋', 'french': 'Leuphorie'}, 'type': ['Normal'], 'base': {'HP': 255, 'Attack': 10, 'Defense': 10, 'Sp. Attack': 75, 'Sp. Defense': 135, 'Speed': 55}}
Fewest members
Which Pokemon type has the fewest members?
We need an algorithm that looks like this:
- load the Pokemon from a JSON file
- group all of the Pokemon by type
- loop through all of the types, finding the one with the fewest Pokemon in it
- print out the smallest group
Here is code that does this:
import json
def find_rarest_type(pokedex):
groups = group_by_type(pokedex)
return find_smallest_group(groups)
with open('pokedex.json') as file:
pokedex = json.load(file)
name, group = find_rarest_type(pokedex)
print(name, len(group))
Notice that we have two functions we have not written yet — group_by_type()
and find_smallest_group()
. Thinking this way helps us write out the structure
of the algorithm first, and then we can fill in the details of these two
functions.
Now, this has two major pieces left:
- How do we group Pokemon by types?
- How do we find the smallest group?
Grouping by types
To group Pokemon by types, we need a dictionary:
{
type : [list of Pokemon of that type]
}
Here is a function that does that:
def group_by_type(pokedex):
# create an empty dictionary
groups = {}
# go through all the Pokemon
for pokemon in pokedex:
# go through all the types that this Pokemon belongs in
for tp in pokemon['type']:
# if this type is not in the dictionary, add it
if tp not in groups:
groups[tp] = []
# append this Pokemon to the list of Pokemon for that type
groups[tp].append(pokemon)
return groups
Note that we are using tp
instead of type
for the variable because type
is
a reserved keyword in Python.
Finding the smallest group
Here is a function that finds the smallest types, using the groups
dictionary
from above:
def find_smallest_group(groups):
# smallest and smallest type are None to start with
smallest = None
smallest_type = None
# go through all of the Pokemon types and their Pokemon
for tp, group in groups.items():
# if this is the smallest so far, keep track of it
if smallest is None or len(group) < len(smallest):
smallest = group
smallest_type = tp
# return the smallest type and the group of Pokemon that are in this type
return smallest_type, smallest
Note that we could have kept track of the count of Pokemon instead of a list of the actual Pokemon for each type. We use a list because maybe someday we want to add functionality that prints out the list of Pokemon of this type.
Running the code
If you put these two functions into the above code, and run it, then you should get:
Ice 34