Creating A "Chatable" Study Database 💬

Data science in practice, I created this manually collected study data for modeling my study time in UCSD since 2022 as a Freshman. I conducted numerous numerical analysis to discover some study habit that I have and analyze when I would be busiest for later quarter's understanding. In addition, I also created chat functions with a language model for a personalized search engine.

Data changes format through out different quarters, becoming more developed and suitable, so merging and some cleaning is needed at first. Each quarter's data includes one data frame of all the study/work time data and an text feature data frame for the work conducted:

  • One almost fully Timestamp + Numerical data frame (year_quarter_study) that records all the study_time
  • One almost fully Timestamp + Text data frame (year_quarter_text) that records the precise study_subject

Data currently include:

  • 2022_fall_study.csv + 2022_fall_text.csv
  • 2022_winter_study.csv + 2022_winter_text.csv
  • 2022_spring_study.csv + 2022_spring_text.csv
  • 2022_summer_study.csv + 2022_summer_text.csv
  • 2023_fall_study.csv + 2023_fall_text.csv
  • 2024_winter_study.csv + 2024_winter_text.csv
  • 2024_spring_study.csv + 2024_spring_text.csv
  • 2024_summer_study.csv + 2024_summer_text.csv
  • 2024_fall_study.csv + 2024_fall_text.csv