Skip to content

MC Comparison: Run Stata phase and finalize results #3

Description

@Davidvandijcke

Status

The Monte Carlo comparison framework from #1 is committed in montecarlo/. Phase 1 (R) is complete. Phase 2 (Stata) needs to run on a faster machine.

What's done

Phase 1 (R MC) — COMPLETE (01_mc_R.R ran locally, ~10 min on 7 cores)

  • 500 sims × 2 DGPs × 2 sample sizes (200, 500) × 3 deltas (0, 1, 2) × 2 methods (simple, frechet) = 24 sharp cells + 1 fuzzy cell
  • All 500/500 valid for every cell
  • R results summary:
    • Coverage: 0.98-0.99 (conservative relative to 0.95 nominal)
    • Size (delta=0): 0.07-0.12
    • Power (delta=2): 0.99-1.00
  • Output saved to montecarlo/output/ (gitignored):
    • output/data/ — 1,300 CSV files (data for Stata)
    • output/xi/ — 200 xi matrices
    • output/results_R/ — 5,127 files (per-rep results + summary + .rds)

What to do next

Step 1: Generate R output on the server

The output/ directory is gitignored, so you need to regenerate it:

cd montecarlo/
Rscript 01_mc_R.R

This takes ~10 min with 7+ cores. It writes all data/xi/results to output/.

Step 2: Run Stata MC

cd montecarlo/
stata-se -b do 02_mc_stata.do
# or: /path/to/StataSE -b do 02_mc_stata.do

This runs 100 sims × 16 sharp cells × 2 runs each (shared-bw + own-bw) + 100 fuzzy sims. Runtime estimate: 2-6 hours depending on the machine.

Known issue fixed: The original capture noisily { r3d ... /// } pattern broke in batch mode. Rewrote to use a helper program (mc_run_and_save) that avoids braces around capture noisily.

If it still fails: Check 02_mc_stata.log in the working directory and output/results_Stata/mc_stata.log. Common issues:

  • r3d package not found → net install r3d, from("../stata_r3d/") replace
  • Mata not compiled → the script tries auto-compilation but may need do ../stata_r3d/mata/r3d_mata.mata manually first
  • Path issues → script assumes cwd = montecarlo/

Step 3: Run comparison

cd montecarlo/
Rscript 03_mc_compare.R

Produces:

  • output/comparison/mc_summary.csv — main results table
  • output/comparison/mc_agreement.csv — per-replication R vs Stata diffs
  • output/comparison/mc_tables.tex — LaTeX table
  • Console: PASS/FAIL assessment

Step 4: Or just run everything

cd montecarlo/
bash 00_run_all.sh

Runs phases 0-3 sequentially (phase 0 = equivalence tests).

Key design decisions

  1. Shared xi matrices: R and Stata use identical N(0,1) multiplier matrices (saved as CSV) for deterministic bootstrap comparison.
  2. Two Stata runs per cell: "shared-bw" (uses R's bandwidths) for apples-to-apples comparison, "own-bw" (Stata's bandwidth selection) for end-to-end comparison.
  3. Stata subset: Only 100 of the 500 sims are run in Stata (runtime). R results use all 500.
  4. Delta=1 skipped in Stata: Only delta=0 and delta=2 (null + large effect) to halve runtime.

Success criteria

  • Direct agreement (shared bw): tau diff p95 < 0.01 (frechet), < 0.05 (simple)
  • Coverage (delta>0): both > 0.80
  • Size (delta=0): both < 0.15
  • R vs Stata power: within 10pp

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions