Structured Data¶

🖌 Data Objects¶

In [ ]:
student1 = {
    'firstname': 'Marissa',
    'surname': 'Taggart',
    'age': 22,
    'major': 'Economics'
}

student2 = {
    'firstname': 'Martin',
    'surname': 'Talmage',
    'age': 23,
    'major': 'Spanish Literature'
}

NOTES

  • What abstraction do we see here?
  • What are the properties of a "student"?
In [ ]:
students = [
    {'firstname': 'Juan', 'surname': 'Lopez', 'age': 18, 'major': 'Linguistics'},
    {'firstname': 'Ulysses', 'surname': 'Bennion', 'age': 25, 'major': 'Mechanical Engineering'},
    {'firstname': 'Sarah', 'surname': 'Grover', 'age': 19, 'major': 'Mathematics'},
    {'firstname': 'Mary', 'surname': 'Han', 'age': 20, 'major': 'Nursing'},
    {'firstname': 'Jacob', 'surname': 'Smith', 'age': 18, 'major': 'Open Major'}
]
In [ ]:
# who is the oldest student in the class?
def get_oldest(students):
    oldest = None
    for student in students:
        if oldest is None or student['age'] > oldest['age']:
            oldest = student
    return oldest

oldest = get_oldest(students)
print(oldest)
In [ ]:
# Do we have any math majors?
def has_math_major(students):
    for student in students:
        if student['major'] == 'Mathematics':
            return True
    return False

print(has_math_major(students))

Functions give us the ability to create abstractions for actions.

  • What does it mean to "find the oldest"?

Data objects give us the ability to create abstractions for information.

  • What does it mean to have information about a "student"?

NOTES

  • Just like the definition of a function depends on the specific application (for example, there are many ways to interpret "find the oldest"), the definition of an object also depends on the application
    • some definitions for student might include age, but might not
    • major, department, class standing, hometown, current residence, student ID, sex, nationality, languages spoken, etc.
  • In programming, the point is not to define all the possible properties of an abstraction, but to define the properties that are needed by your program.
When creating a data abstraction, the point is not to define all the possible properties that might apply in real life, but to define the set of properties needed by your program.

In typed languages (like C++, C#, or Java) you typically define the properties of a data abstraction in the code.

In untyped languages (like Python or Javascript) you typically do not define the properties of a data abstraction in the code.

NOTES

  • So, in python, use comments to clarify the structure of your data
  • Also, infer the structure of the data from the code

The definition of a data abstraction is known as the type definition, schema, or shape.

In [ ]:
def print_eligible_students(students):
    # Students must be part of the Physics major and be at least 21 years old
    for student in students:
        if student['major'] == 'Physics' and student['age'] >= 21:
            print(f"{student['last']}, {student['first']} ({student['standing']})")

NOTES

  • What structure for the abstraction "student" can you infer from this code?
    • major (string), age (number), last, first, standing
    • draw out the structure of "student" on the board (see below)
  • as far as this function is concerned, can a "student" have other properties? (yes)
  • as far as this function is concerned, can a "student" have fewer properties? (no) Why not? (will give a KeyError)

Student

  • major (string)
  • age (numeric)
  • last
  • first
  • standing

🎨 json¶

In [ ]:
students
In [ ]:
import json

with open('students.json', 'w') as file:
    json.dump(students, file)

students.json¶

In [ ]:
! cat students.json
In [ ]:
import json

with open('students.json', 'w') as file:
    json.dump(students, file, indent=2)
In [ ]:
! cat students.json

students.json¶

Notes

  • Look for matching brackets and braces
  • Look at indentation
  • Here we have a list of objects (represented by dictionaries) with the fields firstname, surname, age, and major
In [ ]:
import json

with open('students.json') as file:
    student_info = json.load(file)
    
print(student_info)
In [ ]:
print(student_info[0])

🖌 Structured Data¶

pokedex.json¶

NOTES

  • Write out the various structures and relationships on the board
    • pokemon: id, name (multilingual name), type (list of string), base
    • multilingual name: english, japanese, chinese, french,
    • base: HP, Attack, Defense, Sp. Attack, Sp. Defense, Speed (all ints)
  • This data uses nested objects and lists

👩🏻‍🎨 Which Pokemon has the largest HP?¶

largest_hp.py¶

In [ ]:
import json


def find_largest_hp(pokedex):
    largest_hp = None
    for pokemon in pokedex:
        if largest_hp is None or pokemon['base']['HP'] > largest_hp['base']['HP']:
            largest_hp = pokemon
    return largest_hp


with open('pokedex.json') as file:
    pokedex = json.load(file)
    
largest_hp = find_largest_hp(pokedex)
print(largest_hp)

🧑🏻‍🎨 Which pokemon type has the fewest members?¶

fewest_members.py¶

In [ ]:
import json

def group_by_type(pokedex):
    groups = {}
    for pokemon in pokedex:
        for tp in pokemon['type']:
            if tp not in groups:
                groups[tp] = []
            groups[tp].append(pokemon)
    return groups


def find_smallest_group(groups):
    smallest = None
    smallest_type = None
    for tp, group in groups.items():
        if smallest is None or len(group) < len(smallest):
            smallest = group
            smallest_type = tp
    
    return smallest_type, smallest
        
    
def find_rarest_type(pokedex):
    groups = group_by_type(pokedex)
    return find_smallest_group(groups)


with open('pokedex.json') as file:
    pokedex = json.load(file)

name, group = find_rarest_type(pokedex)
print(name, len(group))

Key Ideas¶

  • Using dictionaries to represent data objects
  • json
  • Structured data
    • objects and lists as properties of objects