2023-03-30

data science capstone showcase

Our spring 2023 Introduction to Data Science students created the most outstanding showcase of capstone projects in our program's history! Check out some of their findings below...

Spotify - alt rock

Katelynn C analyzed a Spotify Alternative Rock dataset containing hundreds of songs and thousands of values. Here is some of the beautiful Pyret code she wrote to analyze the loudness, energy, and key of each song:

fun key-color(r):

if (r["KEY"] == 1): circle(5, "solid", "yellow")

  else if (r["KEY"] == 2): circle(5, "solid", "blue")

  else if (r["KEY"] == 3): circle(5, "solid", "green")

  else if (r["KEY"] == 4): circle(5, "solid", "brown")

  else if (r["KEY"] == 5): circle(5, "solid", "orange")

  else: circle(5, "solid", "red")

  end

end

NYC Policing - SQF

Michael H analyzed a NYC PD Stop, Question, and Frisk dataset. He took a close look at gender and race, including an attempt to code filters to find 'Karens':

fun is-black-female(r): is-black(r) and is-female(r) end

fun is-white-female(r): is-white(r) and is-female(r) end

fun is-black-male(r): is-black(r) and is-male(r) end

fun is-white-male(r): is-white(r) and is-male(r) end

examples:

  race-location(rowB) is text("🧑🏿‍🦱", 12, "white")

  race-location(rowA) is text("👨🏻‍🦳", 12, "white")

end

Mobile devices

Nancy B analyzed a GSMArena Mobile Devices dataset containing 10,000+ phones. Here is some of the Pyret code she wrote to identify discontinued models:

examples: 

  is-discontinued(phone1) is phone1["STATUS"] == "Discontinued"

  is-discontinued(phone6) is phone6["STATUS"] == "Discontinued"

  is-discontinued(phone1) is "Discontinued" == "Discontinued"

  is-discontinued(phone6) is "continued" == "Discontinued"

  is-discontinued(phone1) is true

  is-discontinued(phone6) is false

end


# Consumes a row and tests what the percentage of phones are discontinued

# is-discontinued :: (r :: Row) --> Boolean

fun is-discontinued(r): r["STATUS"] == "Discontinued" end

movies - imdb

James B analyzed a Top 1000 Movies on IMDB dataset. He filtered out the movies with more than 50,000 votes and then analyzed their gross profits:

fun high-votes(r) : r["NUMBER-VOTES"] > 500000 end

fun gross-emoji(r):

  if (r["GROSS-USD"] < 1000000): text("🙅‍", 12, "white")

  else if (r["GROSS-USD"] < 6977794): text("⚔️", 12, "white")

  else if (r["GROSS-USD"] < 47907861): text("🐞", 12, "white")

  else if (r["GROSS-USD"] < 152580071): text("❄️", 12, "white")

  else: text("⭐", 12, "white")

  end

end


high-scores-table = imdb-movies-table.filter(high-scores)


image-scatter-plot(high-scores-table, "IMDB-RATING", "NUMBER-VOTES", gross-emoji )

ct crime - hartford

Makenzie E analyzed a Connecticut UCR Crime Index dataset containing 39,879 incidents in the Hartford area during 2019. Here is some of the Pyret code she wrote to compare crime patterns by time of day:

fun is-early(r): (r["TIME-24HR"] >= 0) and (r["TIME-24HR"] <= 600) end


fun is-late(r): (r["TIME-24HR"] >= 1800) and (r["TIME-24HR"] <= 2300) end


early-morning-reports-only = ct-police-incidents-table.filter(is-early)


late-night-reports-only = ct-police-incidents-table.filter(is-late)

Intro to data science

Learn more about the amazing Bootstrap Data Science curriculum we use here and here, and celebrate our first-in-the-state ML & AI students on WFSB here.


Class Photo: Our curious and creative Data Science students, sans Nancy B.