Public repository for the arXiv-ready preprint:
Reliable Deep Reinforcement Learning for County-Scale Land-Resource Allocation: Action-Space Decomposition in Environmental Decision Support
The repository contains the code, paper files, deterministic evaluation records, model checkpoints, and derived result artifacts for comparing centralized county-level DRL with a shared-policy township-decomposed MARL formulation for land-resource allocation.
| Path | Contents |
|---|---|
paper/manuscript.tex |
Public preprint LaTeX source |
paper/manuscript.pdf |
Locally compiled public preprint PDF |
paper/figures/ |
Figures used by the public preprint |
submission_ems_paper4/ |
Archived EMS submission package retained for provenance |
The paper evaluates a performance--reliability trade-off. Centralized DRL can find stronger optima, while township-decomposed shared-policy MARL provides more reproducible outcomes by reducing the local action space and preserving county-wide reward feedback.
| Method | Slope change (%) | Contiguity | Baimu # | Baimu ha |
|---|---|---|---|---|
| Centralized DRL | -0.815917 +/- 0.372886 | +0.018359 +/- 0.003285 | +3.600000 +/- 1.854724 | -79.833833 +/- 166.120307 |
| Shared-policy MARL | -0.812821 +/- 0.084599 | +0.016538 +/- 0.001939 | +3.200000 +/- 2.785678 | -74.494650 +/- 68.773583 |
MARL and centralized DRL have nearly identical mean slope reduction in Bishan, while MARL has a 4.4x lower cross-seed standard deviation. Dongxing external validation preserves the reliability pattern but shows stronger mean slope reduction for centralized DRL.
| Path | Contents |
|---|---|
src/ |
County-level environment, MARL environment, training scripts, baselines, and custom SB3 policy |
scripts/recovery/ |
Scripts used to recompute deterministic summaries and figures |
results/evals/ |
Deterministic checkpoint-evaluation JSON files |
results/training_logs/ |
Training logs for DRL/MARL seeds |
results/models/ |
Final and best checkpoint .zip files |
results/baselines/ |
County-level rule-based baseline output |
data/derived_blocks/ |
Derived block artifacts for the public/reproducible portion of the workflow |
paper/ |
Public preprint source, PDF, and figures |
docs/ |
Data availability notes, supplementary fragments, aggregate summaries, and recovery notes |
submission_ems_paper4/ |
Historical EMS package, no longer the main public entry point |
From the repository root:
python scripts/recovery/summarize_eval_set.py
python scripts/recovery/plot_method_comparison_updated.py
python scripts/recovery/plot_allocation_heatmap_5seed.pyThese commands recompute the DRL aggregate statistics and regenerate:
paper/figures/method_comparison.pngpaper/figures/township_allocation.png
Additional diagnostics:
python scripts/recovery/cross_boundary_diagnostics.py
python src/baselines_county.py --skip-independent --run-reward-greedy --reward-top-k 50The Reward-Greedy run requires the restricted parcel GPKG. The extracted diagnostic record used by the manuscript is stored as results/baselines/reward_greedy_top50.json.
The parcel-level TNLS data used in this study are administratively restricted and cannot be publicly redistributed. The repository includes non-restricted derived summaries, deterministic evaluation outputs, figure-generation assets, scripts, environment code, and JSON/CSV summaries that do not expose restricted parcel geometries. Full environment rebuilds, raw-geometry rollouts, and retraining from cadastral parcels require controlled access to the restricted parcel-level and block-artifact files.
See data/README.md and docs/data_availability_statement.txt for details.
Full retraining is computationally expensive:
- Centralized: approximately 8 hours per seed on an A100 GPU.
- MARL: approximately 12 hours per seed on an A100 GPU.
Training entry points:
python src/train_county.py --seed 0 --timesteps 500000
python src/train_county_marl.py --seed 0 --timesteps 500000The Colab/T4 convenience script is retained as src/t4_train_county.py.