Skip to content

yndongo/governance-project

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Governance Project

Builds a country-year panel combining World Bank governance indicators and GDP growth data, then supports SQL-based analysis of how institutional quality and economic performance evolve over time.


Overview

This project processes raw World Bank datasets into cleaned, analysis-ready outputs and constructs a panel dataset for downstream SQL queries. The workflow integrates governance indicators with GDP growth at the country-year level, enabling structured analysis of relationships between institutional quality and economic outcomes.


Data Pipeline

Raw Data (World Bank)
    │
    ├─ GovernanceData.xlsx
    └─ GDPGrowth.csv
    │
    ▼
Python Processing (src/)
    ├─ process_governance.py
    └─ process_gdp.py
    │
    ▼
Cleaned Outputs (data/processed/)
    ├─ GovernanceDataCleaned.csv
    └─ GDPDataCleaned.csv
    │
    ▼
SQL Transformation (sql/)
    ├─ Countries table
    ├─ Governance wide table
    └─ Final panel (country-year)
    │
    ▼
Analytical Queries (Q1–Q12)

Data

This project uses publicly available World Bank datasets:

Governance includes six dimensions:

  • Voice & Accountability (VA)
  • Political Stability (PV)
  • Government Effectiveness (GE)
  • Regulatory Quality (RQ)
  • Rule of Law (RL)
  • Control of Corruption (CC)

Raw data is not included in the repository.

Setup

Download the datasets and place them in:

data/raw/

Rename files:

  • wgidataset_with_sourcedata-2025.xlsxGovernanceData.xlsx
  • World Bank GDP file → GDPGrowth.csv (CSV format required)

Installation

Create and activate a virtual environment, then install dependencies:

python -m pip install -e .

or

python -m pip install -r requirements.txt

Running the Pipeline

Process raw data into cleaned outputs:

python -m governance_project.main

Outputs are written to:

data/processed/

Database Setup (PostgreSQL)

Create a database and connect:

CREATE DATABASE governance_db;

Run SQL files in order:

  1. 01_load_data.sql
  2. 02_create_countries_table.sql
  3. 03_create_gov_wide.sql
  4. 04_create_final_panel.sql
  5. 05_checks.sql

To run SQL files, use command: psql -d governance_db -f sql/<file_name>.sql

This produces a country-year panel combining governance indicators and GDP growth.


Query Descriptions

  • Q1: Calculates 10-year average governance scores by country to identify long-term institutional levels.
  • Q2: Measures 5- and 10-year changes in governance to track medium- and long-term institutional shifts.
  • Q3: Identifies countries with the largest year-over-year improvements in governance.
  • Q4: Identifies countries with the largest year-over-year declines in governance.
  • Q5: Computes average GDP growth over 10-year periods to capture long-run economic performance.
  • Q6: Calculates 5- and 10-year average GDP growth to compare short- and medium-term trends.
  • Q7: Identifies countries with the largest year-over-year increases in GDP growth.
  • Q8: Identifies countries with the largest year-over-year declines in GDP growth.
  • Q9: Measures the correlation between governance indicators and GDP growth across countries and years.
  • Q10: Examines the relationship between governance quality and GDP growth volatility.
  • Q11: Compares GDP growth before and after the 2008 financial crisis to assess overall impact.
  • Q12: Classifies countries based on whether governance and GDP growth move together or diverge over five-year periods.

Key Findings

  • The relationship between governance and GDP growth is weak in aggregate.
  • Higher governance quality is associated with lower growth volatility.
  • Average GDP growth declined after the 2008 financial crisis (approximately -0.19 percentage points).
  • Governance and growth do not consistently move together over medium-term horizons.

Data Validation

Basic validation is applied during processing:

  • ISO3 country codes are standardized and checked for format
  • Years are constrained to valid ranges
  • Governance estimates are validated for numeric bounds

Invalid or malformed rows are flagged during processing.


Project Structure

repo-root/
├─ README.md
├─ pyproject.toml
├─ requirements.txt
├─ .gitignore
├─ src/
│  └─ governance_project/
│     ├─ __init__.py
│     ├─ main.py
│     ├─ process_gdp.py
│     └─ process_governance.py
├─ tests/
│  └─ test.py
├─ sql/
│  ├─ 01_load_data.sql
│  ├─ 02_create_countries_table.sql
│  ├─ 03_create_gov_wide.sql
│  ├─ 04_create_final_panel.sql
│  ├─ 05_checks.sql
│  ├─ Q1_10_year_gov_avgs.sql
│  ├─ Q2_5_and_10_year_gov_changes.sql
│  ├─ Q3_most_improved_by_year.sql
│  ├─ Q4_least_improved_by_year.sql
│  ├─ Q5_10_year_gdp_growth_avg.sql
│  ├─ Q6_5_and_10_year_gdp_avgs.sql
│  ├─ Q7_most_improved_gdp_by_year.sql
│  ├─ Q8_least_improved_gdp_by_year.sql
│  └─ additional SQL analysis files
└─ data/
   ├─ raw/
   └─ processed/

Limitations

  • Missing data varies by country and year, leading to an incomplete panel.
  • Results are based on correlations and descriptive comparisons, not causal inference.
  • Governance effects may operate with time lags not captured in simple comparisons.

Testing

Run the included test script:

python tests/test.py

This verifies governance data processing using repository-relative paths.

About

Analysis pipeline combining World Bank governance indicators and GDP growth data to study how changes in institutional quality relate to economic performance across countries. Includes data cleaning, panel construction, and SQL-based exploratory queries

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages