Builds a country-year panel combining World Bank governance indicators and GDP growth data, then supports SQL-based analysis of how institutional quality and economic performance evolve over time.
This project processes raw World Bank datasets into cleaned, analysis-ready outputs and constructs a panel dataset for downstream SQL queries. The workflow integrates governance indicators with GDP growth at the country-year level, enabling structured analysis of relationships between institutional quality and economic outcomes.
Raw Data (World Bank)
│
├─ GovernanceData.xlsx
└─ GDPGrowth.csv
│
▼
Python Processing (src/)
├─ process_governance.py
└─ process_gdp.py
│
▼
Cleaned Outputs (data/processed/)
├─ GovernanceDataCleaned.csv
└─ GDPDataCleaned.csv
│
▼
SQL Transformation (sql/)
├─ Countries table
├─ Governance wide table
└─ Final panel (country-year)
│
▼
Analytical Queries (Q1–Q12)
This project uses publicly available World Bank datasets:
- Governance Indicators (WGI): https://www.worldbank.org/content/dam/sites/govindicators/doc/wgidataset_with_sourcedata-2025.xlsx (2025 edition, accessed April 22, 2026)
- GDP Growth Data (World Bank): https://api.worldbank.org/v2/en/indicator/NY.GDP.PCAP.KD.ZG?downloadformat=excel (April 8, 2026 edition, accessed April 22, 2026)
Governance includes six dimensions:
- Voice & Accountability (VA)
- Political Stability (PV)
- Government Effectiveness (GE)
- Regulatory Quality (RQ)
- Rule of Law (RL)
- Control of Corruption (CC)
Raw data is not included in the repository.
Download the datasets and place them in:
data/raw/
Rename files:
wgidataset_with_sourcedata-2025.xlsx→GovernanceData.xlsx- World Bank GDP file →
GDPGrowth.csv(CSV format required)
Create and activate a virtual environment, then install dependencies:
python -m pip install -e .or
python -m pip install -r requirements.txtProcess raw data into cleaned outputs:
python -m governance_project.mainOutputs are written to:
data/processed/
Create a database and connect:
CREATE DATABASE governance_db;Run SQL files in order:
01_load_data.sql02_create_countries_table.sql03_create_gov_wide.sql04_create_final_panel.sql05_checks.sql
To run SQL files, use command: psql -d governance_db -f sql/<file_name>.sql
This produces a country-year panel combining governance indicators and GDP growth.
- Q1: Calculates 10-year average governance scores by country to identify long-term institutional levels.
- Q2: Measures 5- and 10-year changes in governance to track medium- and long-term institutional shifts.
- Q3: Identifies countries with the largest year-over-year improvements in governance.
- Q4: Identifies countries with the largest year-over-year declines in governance.
- Q5: Computes average GDP growth over 10-year periods to capture long-run economic performance.
- Q6: Calculates 5- and 10-year average GDP growth to compare short- and medium-term trends.
- Q7: Identifies countries with the largest year-over-year increases in GDP growth.
- Q8: Identifies countries with the largest year-over-year declines in GDP growth.
- Q9: Measures the correlation between governance indicators and GDP growth across countries and years.
- Q10: Examines the relationship between governance quality and GDP growth volatility.
- Q11: Compares GDP growth before and after the 2008 financial crisis to assess overall impact.
- Q12: Classifies countries based on whether governance and GDP growth move together or diverge over five-year periods.
- The relationship between governance and GDP growth is weak in aggregate.
- Higher governance quality is associated with lower growth volatility.
- Average GDP growth declined after the 2008 financial crisis (approximately -0.19 percentage points).
- Governance and growth do not consistently move together over medium-term horizons.
Basic validation is applied during processing:
- ISO3 country codes are standardized and checked for format
- Years are constrained to valid ranges
- Governance estimates are validated for numeric bounds
Invalid or malformed rows are flagged during processing.
repo-root/
├─ README.md
├─ pyproject.toml
├─ requirements.txt
├─ .gitignore
├─ src/
│ └─ governance_project/
│ ├─ __init__.py
│ ├─ main.py
│ ├─ process_gdp.py
│ └─ process_governance.py
├─ tests/
│ └─ test.py
├─ sql/
│ ├─ 01_load_data.sql
│ ├─ 02_create_countries_table.sql
│ ├─ 03_create_gov_wide.sql
│ ├─ 04_create_final_panel.sql
│ ├─ 05_checks.sql
│ ├─ Q1_10_year_gov_avgs.sql
│ ├─ Q2_5_and_10_year_gov_changes.sql
│ ├─ Q3_most_improved_by_year.sql
│ ├─ Q4_least_improved_by_year.sql
│ ├─ Q5_10_year_gdp_growth_avg.sql
│ ├─ Q6_5_and_10_year_gdp_avgs.sql
│ ├─ Q7_most_improved_gdp_by_year.sql
│ ├─ Q8_least_improved_gdp_by_year.sql
│ └─ additional SQL analysis files
└─ data/
├─ raw/
└─ processed/
- Missing data varies by country and year, leading to an incomplete panel.
- Results are based on correlations and descriptive comparisons, not causal inference.
- Governance effects may operate with time lags not captured in simple comparisons.
Run the included test script:
python tests/test.pyThis verifies governance data processing using repository-relative paths.