2023-03-30
data science capstone showcase
Our spring 2023 Introduction to Data Science students created the most outstanding showcase of capstone projects in our program's history! Check out some of their findings below...
Spotify - alt rock
Katelynn C analyzed a Spotify Alternative Rock dataset containing hundreds of songs and thousands of values. Here is some of the beautiful Pyret code she wrote to analyze the loudness, energy, and key of each song:
fun key-color(r):
if (r["KEY"] == 1): circle(5, "solid", "yellow")
else if (r["KEY"] == 2): circle(5, "solid", "blue")
else if (r["KEY"] == 3): circle(5, "solid", "green")
else if (r["KEY"] == 4): circle(5, "solid", "brown")
else if (r["KEY"] == 5): circle(5, "solid", "orange")
else: circle(5, "solid", "red")
end
end
NYC Policing - SQF
Michael H analyzed a NYC PD Stop, Question, and Frisk dataset. He took a close look at gender and race, including an attempt to code filters to find 'Karens':
fun is-black-female(r): is-black(r) and is-female(r) end
fun is-white-female(r): is-white(r) and is-female(r) end
fun is-black-male(r): is-black(r) and is-male(r) end
fun is-white-male(r): is-white(r) and is-male(r) end
examples:
race-location(rowB) is text("🧑🏿🦱", 12, "white")
race-location(rowA) is text("👨🏻🦳", 12, "white")
end
Mobile devices
Nancy B analyzed a GSMArena Mobile Devices dataset containing 10,000+ phones. Here is some of the Pyret code she wrote to identify discontinued models:
examples:
is-discontinued(phone1) is phone1["STATUS"] == "Discontinued"
is-discontinued(phone6) is phone6["STATUS"] == "Discontinued"
is-discontinued(phone1) is "Discontinued" == "Discontinued"
is-discontinued(phone6) is "continued" == "Discontinued"
is-discontinued(phone1) is true
is-discontinued(phone6) is false
end
# Consumes a row and tests what the percentage of phones are discontinued
# is-discontinued :: (r :: Row) --> Boolean
fun is-discontinued(r): r["STATUS"] == "Discontinued" end
movies - imdb
James B analyzed a Top 1000 Movies on IMDB dataset. He filtered out the movies with more than 50,000 votes and then analyzed their gross profits:
fun high-votes(r) : r["NUMBER-VOTES"] > 500000 end
fun gross-emoji(r):
if (r["GROSS-USD"] < 1000000): text("🙅", 12, "white")
else if (r["GROSS-USD"] < 6977794): text("⚔️", 12, "white")
else if (r["GROSS-USD"] < 47907861): text("🐞", 12, "white")
else if (r["GROSS-USD"] < 152580071): text("❄️", 12, "white")
else: text("⭐", 12, "white")
end
end
high-scores-table = imdb-movies-table.filter(high-scores)
image-scatter-plot(high-scores-table, "IMDB-RATING", "NUMBER-VOTES", gross-emoji )
ct crime - hartford
Makenzie E analyzed a Connecticut UCR Crime Index dataset containing 39,879 incidents in the Hartford area during 2019. Here is some of the Pyret code she wrote to compare crime patterns by time of day:
fun is-early(r): (r["TIME-24HR"] >= 0) and (r["TIME-24HR"] <= 600) end
fun is-late(r): (r["TIME-24HR"] >= 1800) and (r["TIME-24HR"] <= 2300) end
early-morning-reports-only = ct-police-incidents-table.filter(is-early)
late-night-reports-only = ct-police-incidents-table.filter(is-late)