Final Projects
// The economist's data analysis workflow //Your Project Title Here (Your Name, Your Partner's Name)
Research Question.What is an interesting question to YOU that you can answer with data?
Data Source.What dataset could you use to answer your research question?
Methods.What statistical methods would you need for your question?
Main Finding.You may find something interesting!
Highlighted
Gender Wage Gap (Wan, Nurboev, F2025)
Research Question.Does the gender wage gap change when controlling for job title?
Data Source.Glassdoor Gender Pay Gap dataset
Methods.OLS regression with categorical controls
Main Finding.Raw 9.5% gap disappears when controlling for job title
Urban-Rural Polarization (Sobolewski, Teets, F2025)
Research Question.How has Democratic vote share changed across counties from 2000 to 2016, and do these trends point to widening urban-rural polarization?
Data Source.County presidential election results 2000-2024
Methods.OLS with interaction term
Main Finding.Positive interaction (0.019, p < 0.001) confirms widening urban-rural polarization
Which Maple Is Best? (Sensibar, Sharma, F2025)
Research Question.Which maple tree species provides the most benefits in Pittsburgh?
Data Source.Pittsburgh Trees dataset (45,709 trees)
Methods.OLS regression with interaction term
Main Finding.Norway Maples provide ~$10.58 more benefits and gain more per unit volume (interaction p < 0.001)
GDP, Dentist Density, and Healthcare (Ghobrial, Ghobrial, Merlos, F2025)
Research Question.Does the relationship between log GDP per capita and density of dentists differ between countries with universal healthcare and countries without?
Data Source.WHO dentist density and GDP per capita data (2020)
Methods.OLS regression with interaction term
Main Finding.Log GDP positively predicts dentist density (coef = 1.74, p < 0.001); interaction with UHC not significant
Income and Life Expectancy (Taivan, Suess, F2025)
Research Question.Is higher household income associated with a longer life expectancy?
Data Source.Health Inequality Project data (Chetty et al.)
Methods.OLS regression with log transformation and interaction term
Main Finding.Log income predicts life expectancy (coef = 1.79, p < 0.001); men show steeper gradient (interaction = 1.10, p < 0.001)
Tuition and Enrollment (Banawan, Cooper, Voss, F2025)
Research Question.How do changes in in-state tuition affect changes in undergraduate enrollment, and does this differ between public and private institutions?
Data Source.IPEDS tuition and enrollment data (2016 and 2023)
Methods.OLS regression with arcsinh transformation and interaction term
Main Finding.Tuition-enrollment relationship differs by institution type; public institutions show a more negative effect (interaction p = 0.09)
Economic Changes and Decision to Have Children (Pawar, Tseytlin, F2024)
Research Question.Do economic changes correlate with decisions to have children in the short-term?
Data Source.Fertility rate data (UN); GDP data (Federal Reserve)
Methods.Fertility rates analysis for ages 20-30, GDP growth calculations, regression analysis
Main Finding.Significant relationship between GDP and fertility rates; economic recessions correlate with fertility declines
Voting Turnout and Single Parent Households (Plack, F2024)
Research Question.How does the percentage of single parent households relate to voter turnout?
Data Source.U.S. Census (2020), state voter registration data, MIT Election Data
Methods.County-level data compilation, correlation analysis, regression
Main Finding.Negative correlation: 1% increase in single parent households -> 0.741% decrease in turnout
Education Level and Vote Share in Pennsylvania (Batty, Thomas, F2024)
Research Question.Is there a relationship between educational level and vote share in PA (2024)?
Data Source.U.S. Census Bureau for education levels; Politico for voting results
Methods.Regression analysis by education level, slope coefficient calculations, visualizations
Main Finding.High school-only counties favored Trump (+2.71% per point); college degree counties favored Harris (+3.82%)
Major and Income (Iskandarani, Chau, F2025)
Research Question.Does undergraduate major impact income?
Data Source.IPUMS American Community Survey (2023) - 1 million person sample
Methods.OLS regression with categorical major variable and age/education controls
Main Finding.Major significantly predicts income. Engineering and Physical Sciences have highest premiums (R² = 0.54)
▶ Health & Life Expectancy
Health Spending and Life Expectancy (Karpas, Heimel, F2025)
Research Question.Is the amount of health spending in a country associated with life expectancy?
Data Source.World Bank life expectancy and health expenditure per capita data (2022)
Methods.OLS regression with log transformation
Main Finding.Log health expenditure strongly predicts life expectancy (coef = 4.5 years per unit, p < 0.001, R² = 0.65)
Sleep and Social Media (Molz, Thompson, F2025)
Research Question.How are students' daily sleep affected by social media usage?
Data Source.Students Social Media Addiction Survey (705 students)
Methods.OLS regression: Sleep_Hours ~ Daily_Usage_Hours
Main Finding.Negative relationship between social media usage and sleep hours
GINI and Life Expectancy (Mooney, Bochkoris, Ketels, F2025)
Research Question.Does income inequality (GINI) predict life expectancy across countries?
Data Source.Our World in Data - GINI coefficient and life expectancy
Methods.OLS regression of life expectancy on GINI coefficient
Main Finding.Negative relationship (coef = -33.35, p < 0.001, R² = 0.138)
▶ Politics & Elections
Teen Birth Rates (Tessier, F2025)
Research Question.How does state economy and political leaning affect teen birth rates?
Data Source.U.S. state-level GDP, teen birth rates, and political data (50 states)
Methods.OLS regression with multiple predictors
Main Finding.Republican states have 4.6 higher teen birth rate (p=0.001); GDP negatively associated (p=0.012)
Voter Turnout and Presidential Margins (Brennfleck, Jones, Kachalova, F2025)
Research Question.What is the relationship between voter turnout and the margin of U.S. presidential victories by county?
Data Source.County presidential election results (2000-2024) merged with population estimates
Methods.OLS regression with turnout as predictor
Main Finding.Higher turnout is associated with slightly more Republican-leaning results (coef = -0.13, p < 0.001)
GDP and Voter Turnout (Perkins, F2025)
Research Question.Does GDP have an effect on voter turnout?
Data Source.County-level GDP and unemployment data (presidential election years 2004-2020)
Methods.OLS regression with year fixed effects and percent changes
Main Finding.GDP percent change is NOT significantly associated with voter turnout change (p = 0.603)
Minimum Wage and Voter Turnout (Yee, F2025)
Research Question.Is there a relationship between minimum wage and voter turnout?
Data Source.State-level election turnout data (1984-2024) with minimum wage and median income
Methods.Multiple regression: Voter_Turnout ~ Minimum_Wage + Median_Household_Income
Main Finding.No significant relationship (coef = 0.15, p = 0.38, R² = 0.008)
Income and PA Voter Turnout 2024 (Sophia C, F2025)
Research Question.Did income influence voter turnout in Pennsylvania during the 2024 presidential election?
Data Source.Pennsylvania county-level income and 2024 voter turnout data (66 counties)
Methods.OLS regression: Voter_Turnout ~ Income
Main Finding.Significant positive relationship between income and voter turnout (p < 0.001, R² = 0.265)
▶ Labor, Wages & Education
Education and Income (Evans, Bandi, F2025)
Research Question.Is there a relationship between education level and expected income?
Data Source.IPUMS American Community Survey (2023) - 613,212 observations
Methods.OLS regression of total income on education level
Main Finding.Each unit of education = $7,431 higher income (p < 0.001, R² = 0.174)
▶ Economy & Markets
Air Pollution and GDP (Cen, Habazin, Zheng, F2025)
Research Question.Is there a significant relationship between air pollution and GDP per capita?
Data Source.PM2.5 and GDP per capita data by country (1990-2020)
Methods.OLS regression with log transformation
Main Finding.Higher log GDP is associated with lower air pollution (coef = -5 to -6, p < 0.001)
GDP and Happiness (Arrlington, F2025)
Research Question.Is there a relationship between GDP per capita and happiness?
Data Source.World Bank GDP and World Happiness Report 2024
Methods.OLS regression with log transformation
Main Finding.Log GDP strongly predicts happiness (coef = 1.83, p < 0.001, R² = 0.65)
Corruption and Income Inequality (Getgen, Onyango, Koychev, F2025)
Research Question.Is there a relationship between corruption perception and income inequality across countries?
Data Source.Transparency International CPI and World Bank GINI (180 countries)
Methods.OLS regression of GINI on corruption perception
Main Finding.Positive relationship (coef = 0.0018, p < 0.001, R² = 0.155)
Gas Prices and Consumer Sentiment (Nathan, Schiller, F2025)
Research Question.Is an increase in gas prices associated with a subsequent decrease in consumer sentiment?
Data Source.FRED - Gas prices, Consumer Sentiment, Food CPI (monthly 2000-present)
Methods.OLS with lagged sentiment: next month change ~ this month gas change
Main Finding.Negative but not significant (coef = -2.53, p = 0.080)
▶ Housing & Real Estate
Bus Stops and Home Values (Weir, F2025)
Research Question.Is there a relationship between the number of bus stops and median home value in Pittsburgh neighborhoods?
Data Source.Pittsburgh transit stops and housing cost data by neighborhood
Methods.OLS regression of median home value on bus stop count
Main Finding.Each additional bus stop associated with ~$375 higher home value (p = 0.047)
Housing Prices and Birth Rates (Canavan, F2025)
Research Question.How are changes in housing prices related to birth rates across U.S. states?
Data Source.Zillow home values and state birth rate data (2023)
Methods.OLS regression of fertility rate on median home value
Main Finding.Negative relationship: higher home values associated with lower fertility (p = 0.006, R² = 0.15)
Room Sizes and Housing Prices (Arthur, F2025)
Research Question.What is the impact of room sizes on housing prices?
Data Source.Boston Housing dataset (506 observations)
Methods.OLS regression: MEDV ~ RM
Main Finding.Strong positive relationship between rooms and price (coef = 9.10, p < 0.001, R² = 0.48)
▶ Sports
MLB Attendance and Manager Change (Chirinos, Papa, Mostofa, F2025)
Research Question.Does winning affect MLB attendance?
Data Source.MLB team data (150 team-season observations)
Methods.OLS regression
Main Finding.Each additional win is associated with a 0.67 percentage point increase in attendance (p < 0.001)
MLB Salaries and Wins (Harrer, Reardon, Hu, F2025)
Research Question.Do higher team salaries lead to higher win percentages in the MLB?
Data Source.MLB team salary and win data (2024 season)
Methods.OLS regression of win percentage on total salary
Main Finding.Positive relationship: higher salaries associated with more wins (R² = 0.21)
MLB Payroll and Attendance (Lis, Fernandez, F2025)
Research Question.To what extent do payroll and fan attendance predict the number of wins for MLB teams?
Data Source.MLB team statistics (payroll, attendance, wins)
Methods.Multiple regression: Wins ~ Payroll_Pct + Average_Attendance
Main Finding.Both predict wins (Payroll: 2.06, p < 0.001; Attendance: 0.44, p < 0.001)
NFL Scoring and Weather (Olijar, F2025)
Research Question.How does NFL scoring relate to bad weather days?
Data Source.2024 NFL game data including point totals and weather conditions
Methods.OLS regression with binary weather variable: Point_Total ~ Bad_Weather
Main Finding.Bad weather reduces scoring by ~4.3 points per game (p = 0.004)
MLB Payroll Success Rate (McCollick, Holcombe, F2025)
Research Question.Does higher payroll lead to more wins in the MLB?
Data Source.MLB team payroll and win/loss data (2010-2024)
Methods.OLS regression: Win_Loss_Ratio ~ Total_Payroll
Main Finding.Positive but weak relationship (R² = 0.11); active roster payroll better predictor (R² = 0.24)
▶ Other
Restaurant Costs and Ratings (Zhang, F2025)
Research Question.Do restaurants with higher average costs receive higher customer ratings?
Data Source.Zomato restaurant dataset (~7000 restaurants)
Methods.OLS regression of rating on average cost
Main Finding.Positive but small effect: each cost unit = 0.0004 higher rating (p < 0.001, R² = 0.14)
Hotel Ratings by Room Type (Brodecki, F2025)
Research Question.Is there a difference in hotel ratings between Economy and Luxury room classifications?
Data Source.Hotel pricing and rating data (108 hotels)
Methods.OLS regression with categorical variable: Rating ~ Room_Class
Main Finding.Luxury hotels have significantly higher ratings than Economy hotels
▶ Fall 2024
Expected Value Per Sport Card (Ghaffari, Vasos, F2024)
Research Question.Is there a difference in expected value between basketball and football sports cards?
Data Source.Professional Sport Authenticator (PSA) card-grading service submissions
Methods.Box plots, one-tailed t-test for statistical significance
Main Finding.Basketball cards showed higher expected value than football cards (p-value = 0.09)
Pittsburgh Housing Market by Neighborhood (Shaker, F2024)
Research Question.Which Pittsburgh neighborhoods show highest housing price growth, and how does bedroom count relate?
Data Source.Zillow research datasets on Pittsburgh home prices, 2000-2024
Methods.Python/Scikit-learn, linear regression, growth metrics, normalized slope calculations
Main Finding.Central Lawrenceville (2.07%/month) led growth; five-bedroom homes appreciated fastest (2.16%/month)
GDP and Renewable Energy Use (Murphy, F2024)
Research Question.Is higher GDP related to increased renewable energy use?
Data Source.Our World in Data (global); U.S. Energy Information Administration and Statista (US)
Methods.Box-and-whisker plots, regression analysis, scatter plots for US and global datasets
Main Finding.No significant relationship in US data; weak but significant global relationship (R² = 0.114)
Economic Costs of Climate Change Events (Asfaw, Gonzalez, Baljeet, F2024)
Research Question.How have economic costs of extreme weather events changed over the past two decades in the U.S.?
Data Source.NOAA Billion-Dollar Weather and Climate Disasters database
Methods.Excel analysis, box plots, bar charts, t-test comparing 2000-2009 vs. 2010-2020
Main Finding.Costs nearly doubled from 2000s ($62B/year) to 2010s ($99.3B/year); 2019-2023 averaged $123.5B/year
Gender Wage Gap in Engineering Fields (Mayreddy, Strein, F2024)
Research Question.How does Labor Force Participation relate to gender wage gap in engineering over time?
Data Source.U.S. Department of Labor for earnings; FRED for labor force participation rates
Methods.Wage difference calculations, labor participation analysis, regression analysis
Main Finding.As labor force participation gap decreases, wage gap decreases; women still earn only 90 cents per men's $1
Education's Impact on Wage Varying by Gender (Samson, Snyder, F2024)
Research Question.How do university degrees relate to wages at different educational levels for both genders?
Data Source.United States Census Bureau's Current Population Survey on earnings by education and gender
Methods.Educational attainment transformation, Google Colab/Python regression analysis, p-value calculations
Main Finding.Each year of education: +$5,311 for men vs. +$3,918 for women; gap widens at higher education levels
Voter Age and Turnout in Pennsylvania (Kogan, Hamilton, Zink, F2024)
Research Question.What is the relationship between voter age and turnout in PA counties (2024)?
Data Source.Commonwealth of Pennsylvania voter registration data; New York Times PA map
Methods.Combined voter registration with turnout data, scatter plots, linear regression
Main Finding.Strong positive correlation: +1 year in average county age -> +0.9% in voter turnout
Health Expense and Life Expectancy (Ressler, Stumpf, F2024)
Research Question.How does health spending relate to life expectancy?
Data Source.World Bank data on Health Expenditure per Capita and Life Expectancy
Methods.Multi-year data aggregation, histogram categorization, scatter plots, regression analysis
Main Finding.Each $1 per capita increases life expectancy by 0.0034 years (1.24 days); R² = 0.49
NFL Quarterbacks: Passing Yards and Salary (Palos, Hines, F2024)
Research Question.Is there a correlation between NFL quarterback passing yards and salary?
Data Source.Over the Cap for contracts; Pro Football Reference for 2024 NFL passing statistics
Methods.Python/pandas data merging, scatter plots, linear regression, p-value calculation
Main Finding.Significant positive correlation (p-value = 1.42e-09); regression: y = 11007.06x + 69706.75
MLB Ticket Prices (Mull, Louis, Roseman, F2024)
Research Question.How do MLB ticket prices vary with team wins, payroll, and city population?
Data Source.Statista for ticket prices; USA Today for payrolls; U.S. Census for city populations
Methods.Dataset creation combining multiple variables, regression analysis in R, p-value calculations
Main Finding.Team payroll (+0.22 units per unit increase) and wins (+1.21 units per win) significantly correlate with prices
Income and Voting Rate (Roth-Lackman, Yuan, F2024)
Research Question.How does 2023 household income relate to voting choice in each state?
Data Source.US Census for household income; Associated Press for 2024 presidential election voting data
Methods.Maps and bar charts creation, regression equations, p-value analysis
Main Finding.Higher income correlates with Democratic voting; Republicans have 7.93% advantage among lower-income voters
Unemployment Benefits and Job Recovery (Dignazio, Guan, Zhao, F2024)
Research Question.How do unemployment benefits correlate with job recovery (2010-2023)?
Data Source.U.S. Bureau of Labor Statistics, Department of Labor, FRED, CBO
Methods.Descriptive statistics, correlation analysis, regression during stable periods vs. crises
Main Finding.Relationship depends on economic context; COVID-19 benefits showed minimal disincentive effects
Unemployment and GDP Growth During COVID-19 (Ammad, Mahajan, F2024)
Research Question.How did unemployment and GDP growth relate before, during, and after COVID-19?
Data Source.Federal Reserve Economic Data (FRED), 2014-2024
Methods.Line graphs, scatter plots with regression lines across different periods
Main Finding.COVID-19 disrupted typical relationship; 2020 saw GDP at -7.5% while unemployment spiked to ~12%
Voting Policies and Turnout Rates (Zentner, Weaver, Sama, F2024)
Research Question.What is the relationship between voting policies and turnout rates?
Data Source.U.S. Census Bureau, Ballotpedia
Methods.Policy comparison categories, bar charts, pivot tables, regression analysis
Main Finding.Universal mail-in voting states had 3% higher turnout; no-excuse absentee states had 1.87% higher turnout
Global Coffee Price Trends (Abdulmajid, Washington, F2024)
Research Question.How have global coffee prices and their percent changes evolved over the past decade?
Data Source.Federal Reserve Bank of St. Louis (FRED), 2014-2024
Methods.Descriptive statistics, line plots, regression analysis, percent change calculations
Main Finding.Significant volatility with lows in 2018, sharp increases in 2020-2022 during supply chain disruptions
Economic Impact of Public Health Spending (Ryu, Zhang, F2024)
Research Question.What is the relationship between national health expenditures and GDP?
Data Source.Centers for Medicare & Medicaid Services (CMS) data, 1972-2022
Methods.OLS regression, scatter plots with regression lines
Main Finding.For every $1B increase in health spending, GDP increases by ~$5.21B (R² of 0.991)
Labor Productivity and Unemployment Rate (Yuan, F2024)
Research Question.How does labor productivity relate to the unemployment rate?
Data Source.Bureau of Labor Statistics (BLS) for U.S. unemployment and productivity data, 2014-2024
Methods.Line charts, scatter plots, regression analysis to quantify relationships
Main Finding.Negative correlation: each unit increase in productivity associates with 0.15 percentage point decrease in unemployment
Tesla's Supercharging Network and Vehicle Sales (McAdoo, Smith, F2024)
Research Question.How does the expansion of Tesla's Supercharging network correlate with Tesla vehicle sales?
Data Source.GoodCarBadCar.net for sales; Kaggle for Supercharger locations; WSJ for investments
Methods.Line and scatter plots, Pearson correlation coefficient, simple linear regression model
Main Finding.Moderate positive correlation (r = 0.59); each additional charging station associates with ~288 more vehicle sales
Factors Influencing Voter Turnout (Kelly, OConnor, Smith, F2024)
Research Question.What factors correlate with voter turnout in U.S. presidential elections (1988-2020)?
Data Source.US Elections Project's "Voter Turnout Demographics" dataset
Methods.Bar graphs and line graphs by demographics, hypothesis testing for trends
Main Finding.Three consistent factors: race (White higher), age (60+ highest), and education (higher degrees = higher turnout)
Unemployment Rate and GDP (Gohel, Oh, F2024)
Research Question.What is the relationship between unemployment rate and GDP in the United States (2000-2024)?
Data Source.Federal Reserve Economic Data (FRED) for unemployment and GDP data
Methods.Line plots, scatter plots with regression lines, Pearson correlation, regression analysis
Main Finding.Negative correlation (y = 20839.98 - 625.20x) but not statistically significant (R² = 0.06, p-value > 0.05)
Inflation, Unemployment, and GDP (Weber, F2024)
Research Question.How do inflation, unemployment, and GDP interact as key economic indicators?
Data Source.FRED for inflation rates; World Bank for GDP Per Capita and Unemployment rate data
Methods.Historical data analysis, policy strategy evaluation, relationship pattern identification
Main Finding.Phillips Curve relationship confirmed; policies improving one indicator often adversely affect others
Minimum Wage Increases and Inflation (Liu, F2024)
Research Question.What is the relationship between minimum wage increases and inflation?
Data Source.U.S. Department of Labor (wages); Bureau of Labor Statistics (CPI); Census Bureau (population)
Methods.Descriptive statistics, t-tests comparing states, correlation and regression analysis
Main Finding.Significant correlation: $1/hour minimum wage increase associates with 1.981 point CPI increase
Voter Turnout Analysis (Hoesch, Peterson, F2024)
Research Question.How does voter turnout vary by state/age and relate to election results?
Data Source.Multiple sources including voter turnout data by age (2016/2020), state-level data (2024)
Methods.Z-score tests, maps and scatter plots, correlation analysis
Main Finding.Weak negative correlation between turnout and Republican margin; highest-turnout states tend Democratic
Impact of 2020 Presidential Transition on Economic Confidence (Hussain, Ryder, F2024)
Research Question.Did the Trump-Biden transition in 2020 correlate with changes in economic confidence?
Data Source.Federal Reserve Bank of St. Louis for business confidence; MarketWatch for Dow Jones; OECD for consumer confidence
Methods.Key indicator analysis around Nov 2020 and Jan 2021, pattern examination, fluctuation assessment
Main Finding.No significant correlation between political transition and economic confidence metrics
The Championship Effect (Eubanks, F2024)
Research Question.How does winning college football championships influence university applications?
Data Source.College Football Stats, Reference.com; Admissions data from Alabama and Clemson
Methods.Bar charts, percentage change calculations, dot plots with Google Colab.
Main Finding.Positive correlation between winning championships and increased applications, supporting the "Flutie Effect"