Python Projects (Data 8)
Classifying Movies: K-Nearest Neighbors Classifier (April 2024)
• Created a k-nearest neighbors classifier designed to predict a movie’s genre by analyzing the frequency of keywords and associations in movie scripted
• Developed and tested training data sets, extending the classifier to use multiple features and computed Euclidean distances for accurate classification
• Investigated words associations in a move script dataset, crafting a custom feature set to enhance the performance of the machine learning model
Climate Change: Temperatures and Precipitation (March 2024 - April 2024)
• Analyzed historical temperature and precipitation data from 210 U.S cities to investigate long-term climate trends
• Utilized statistical techniques, such as hypothesis testing and confidence intervals, to measure and assess changes in temperatures and warming trends over time
• Conducted an A/B test on annual precipitation data to evaluate the impact of drought periods as defined by the U.S Environmental Protection Agency (EPA)
World Population and Poverty (February 2024)
• Examined global population growth and factors influencing them, with a detailed case study on Poland’s demographic changes since 1900
• Visualized statistical methods (line plots, scatterplots, and histograms) to examine life expectancy, fertility rates, child mortality, and their impact on population dynamics
• Evaluated global poverty trends, modeling data from 145 countries to understand the influence of colonialism, healthcare, economics, and social inequality on poverty rates
Predicting Baseball Wins: Method of Least Squares (April 2024)
• Developed three regression models, including simple and multiple regressions, to predict MLB team wins based on runs and other performance metrics
• Analyzed and plotted data from all MLB teams since 2000 to examine the relationship between runs scored, runs allowed, and team wins
• Applied log transformations and addition variables to increase model accuracy, achieving higher r-squared values