Skip to content

Data and Directory

Simon Paige edited this page May 4, 2026 · 5 revisions

Data and Directory

Where the directory's organizations come from, how they were classified, and what the known gaps are.

Honest Numbers (April 2026 published export)

  • 168,650 active organizations across 183 countries with at least one entry (102 countries have geocoded coordinates)
  • Top sources: Mapa das OSCs (Brazil) ~85,472, ACNC Charity Register (Australia) ~48,619, UK Charity Commission ~13,614
  • 29,566 geocoded entries on the published map

Recovery caveat (2026-05-04). The numbers above come from the static export at data/search/index.json and data/map/stats.json, last built 2026-04-28. The working SQLite database was rebuilt on 2026-05-04 from public exports and currently holds 149,507 organizations, 29,422 mapped rows, and 158 countries. Until the next static rebuild, expect the working DB and the public exports to disagree on totals. See README.md for the canonical statement.

  • Wave A (national registries): Brazil ~85K, Australia ~48K, Bulgaria ~2.5K, plus IRS BMF, UK Charity Commission, Wikidata, ProPublica
  • Wave B (thematic networks, added April 2026): 9,115 rows from cooperative federations, mutual aid networks, intentional communities, transition initiatives, CLTs, RIPESS family
  • ~10,800 score >= 5 on framework alignment (strongest matches)
  • ~27,200 geocoded entries

Sources

Source Records License
Source Records License
Mapa das OSCs (Brazil) 85,432 Public / IPEA
ACNC Charity Register (Australia) 48,498 Open Government
UK Charity Commission 11,396 Open Government Licence v3
IRS Exempt Organizations BMF 9,392 US Gov public domain
Mutual Aid Wiki 4,251 CC BY-NC-SA
Wikidata Bulgaria NPOs 2,518 CC0
Foundation for Intentional Community 1,098 -
Transition Network 995 ODbL
Mutual Aid Hub 898 PDDL-1.0
SUSY Map 887 Public Domain
ProPublica Nonprofit Explorer 602 Public (IRS)
Wikidata (land trusts) 444 CC0
Schumacher CLT World Map 406 -
Wikidata (labor unions) 405 CC0
ICA member directory 322 Open
ITUC affiliates 297 -
Wikidata 192 CC0
New Economy Coalition members 180 -
Wikidata (subregion) 82 CC0
Construction coops 81 -
RIPESS family 78 -
Habitat affiliates 66 -
Web research 58 -
Grounded Solutions 38 -
Manual curation 13 -

Full source documentation: DATA.md

Methodology

  1. Raw records pulled from public registries and thematic directories
  2. Pre-filters at ingest drop obvious non-fits (e.g., pure-religious orgs, pure-patronal associations from Brazil)
  3. Multi-pass audit pipeline: audit_pass1, audit_pass2, audit_pass3_ntee, audit_quality
  4. phase2_filter.py scores entries on keyword + legal-form axis
  5. Every removed row is preserved in data/trim_audit/ CSVs

Legibility Tags

Every organization is tagged by how we know what it is:

  • formal -- registered legal entity; legal form is the alignment evidence
  • informal -- unincorporated network (mutual aid groups, neighborhood orgs)
  • hybrid -- legibility varies (Transition Network groups, intentional communities)
  • unknown -- older imports awaiting re-classification

This split matters because treating a neighborhood pandemic response group the same as a registered cooperative hides how the movement actually works.

Known Issues (Wave D in progress)

  1. state_province regression: US state data from Wave B ingesters uses full state names instead of 2-letter codes, causing duplicate entries in the search index. Fix: normalize_us_state() helper + one-shot DB migration. Priority 1.

  2. Find.coop, RAESS, ASEC: skipped in Wave B. Find.coop got an outreach ask first (partnership model); RAESS and ASEC have dead DNS. Tracked at tools/mycelial-outreach/drafts/pending/.

  3. Automated classification artifacts: a motorcycle club with NTEE code Y42 might appear in "cooperatives." Known issue. The legibility column and alignment scores help but don't eliminate this.

Adding Data

If you know a registry, thematic network, or regional directory that belongs here, open an Issue at github.com/simonlpaige/commonweave with:

  • Source name + URL
  • License
  • Estimated record count
  • Why it fits the framework

Clone this wiki locally