This project analyzes U.S. college characteristics using stratified random sampling methodology. Two main research questions are addressed:
- Problem 1: Estimating total undergraduate enrollment across U.S. institutions
- Problem 2: Estimating the proportion of colleges offering undergraduate Statistics majors
Most-Recent-Cohorts-Institution_05192025.csv- Institution-level dataMost-Recent-Cohorts-Field-of-Study.csv- Program-level data- College Scorecard: https://collegescorecard.ed.gov/
- Sampling Design: Stratified random sampling (n=100)
- Stratification: Public/Private × 2-year/4-year (4 strata)
- Allocation: Proportional allocation
- Estimation: Design-based estimators with 95% confidence intervals
project1.Rmd- R Markdown source file with analysis and documentationproject1.pdf- Compiled report
To compile the R Markdown file to PDF:
Rscript -e "rmarkdown::render('project1.Rmd', output_format = 'pdf_document')"Or simply:
Rscript -e "rmarkdown::render('project1.Rmd')"- R packages:
dplyr,ggplot2,tidyr,gridExtra - LaTeX distribution (for PDF output)
Zhihao Chen, Zixiao Tan Date: 2025-10-14