Python Projects (Data 8
Classifying Movies: K-Nearest Neighbors Classifier (April 2024)
     •  Created a k-nearest neighbors classifier designed to predict a movie’s genre by analyzing the frequency of keywords and associations in movie scripted 
     •  Developed and tested training data sets, extending the classifier to use multiple features and computed Euclidean distances for accurate classification 
     •  Investigated words associations in a move script dataset, crafting a custom feature set to enhance the performance of the machine learning model 
Climate Change: Temperatures and Precipitation (March 2024 - April 2024)
     •  Analyzed historical temperature and precipitation data from 210 U.S cities to investigate long-term climate trends 
     •  Utilized statistical techniques, such as hypothesis testing and confidence intervals, to measure and assess changes in temperatures and warming trends over time 
     •  Conducted an A/B test on annual precipitation data to evaluate the impact of drought periods as defined by the U.S Environmental Protection Agency (EPA)
World Population and Poverty (February 2024)
     •  Examined global population growth and factors influencing them, with a detailed case study on Poland’s demographic changes since 1900 
     •  Visualized statistical methods (line plots, scatterplots, and histograms) to examine life expectancy, fertility rates, child mortality, and their impact on population dynamics
     •  Evaluated global poverty trends, modeling data from 145 countries to understand the influence of colonialism, healthcare, economics, and social inequality on poverty rates


R Projects (Stat 20) 
Predicting Baseball Wins: Method of Least Squares (April 2024)
     •  Developed three regression models, including simple and multiple regressions, to predict MLB team wins based on runs and other performance metrics 
     •  Analyzed and plotted data from all MLB teams since 2000 to examine the relationship between runs scored, runs allowed, and team wins 
     •  Applied log transformations and addition variables to increase model accuracy, achieving higher r-squared values
Back to Top