Skip to content

alexanderpersson3/nordic-deal-sourcing-graph

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Nordic Deal Sourcing Graph

Ranks acquisition / investment targets from public Nordic & EU procurement data — by contract-win momentum, scale, and institutional diversity — and drafts a first sourcing memo per top candidate. A rifle scope, not a CSV dump.

License Status Tests Data

Why this matters

PE, corp dev, and private-credit teams all fight the same fight: find the target before the market prices it. EQT built Motherbrain to do exactly this and treats an 18–24 month tech lead as a moat. This repo is the Nordic angle that most candidates never touch — it fuses public data sources few people use well into a ranked, explainable target list. That's the differentiator between "another AI builder with American CSVs" and someone who speaks the local market.

Research basis

  • Cao et al. (EQT Motherbrain), "Beyond Gut Feel," ICANN 2024 — framing deal sourcing as multivariate time-series classification of company trajectories.

  • ML-in-M&A thesis (Aalto, surfaced): Predicting Merger and Acquisition Outcomes: A Machine Learning Approachaaltodoc

  • Government-contract signal literature — public procurement carries both a pricing signal and an information signal about firm health and momentum.

  • Master's thesis to reverse-engineer (surfaced — read & vet next): Aalto, Predicting venture-capital-backed start-up success with machine learningaaltodoc

    Math we're stealing: the trajectory features and classifier that score a company's odds of IPO/acquisition — Motherbrain's "Beyond Gut Feel" idea, made reproducible on public data.

  • Frontier to push beyond (2025–2026): the LLM multi-agent investment-analysis wave (e.g. arXiv 2602.00082) applied to public-data target ranking.

Hypothesis: contract-win momentum, scale, and institutional breadth are leading, public, and underused signals for sourcing — legible long before a banker's teaser lands in the inbox.

Data

Source Provides Cost License caveat
TED v3 Search API EU public-contract notices Free, no key Open. Polite User-Agent required (sets TED_USER_AGENT).
Bolagsverket open data Swedish ownership / financials Free Open (roadmap — manual validation today.)
GLEIF LEI entity identifiers Free Open (roadmap — for ER beyond suffix-stripping.)

TED v3 coverage caveat: the v3 Search API only carries notices from ~early 2026 onward, due to the EU's eForms transition. Older windows return sparse data. The MVP defaults to a 6-month look-back for this reason.

System design

flowchart LR
    A[TED v3<br/>award notices] --> B[TEDClient<br/>iteration-token pagination]
    B --> C[Notice model<br/>buyer, winner, value, CPV, date]
    C --> D[normalize_supplier_name<br/>Nordic suffix strip]
    D --> E[aggregate_suppliers<br/>group by canonical name]
    E --> F[SupplierSnapshot<br/>features per supplier]
    F --> G[score_supplier<br/>momentum + scale + diversity]
    G --> H[Composite rank<br/>balanced/momentum_led/scale_led]
    H --> I[CSV + PNG + per-supplier memo]
Loading

Two disciplines do most of the work:

  1. Entity resolution that's honest about its limits. Nordic legal suffixes (AB, Aktiebolag, Oy, Oyj, A/S, ApS, ASA, SE, GmbH, ...) get stripped on a word-boundary regex; case + whitespace get normalised. Volvo AB, VOLVO AB, and Volvo Aktiebolag collapse to one supplier. Volvo Cars AB and Volvo Trucks AB stay separate — they're distinct legal entities, and merging them would be wrong. Typo merging and corporate-tree resolution wait for GLEIF LEI integration in v2.

  2. Scoring weights are defensible by construction, not by fit. No M&A label dataset exists to train a classifier against, so we don't pretend to. The three sub-scores (momentum, scale, diversity) are percentile-ranked within the universe and combined with documented weights per profile (balanced, momentum_led, scale_led). Easy to inspect; easy to defend; easy to swap for a real model when labels arrive.

Results

Live scans against TED v3, 6-month look-back, balanced scoring profile. Both runs hit the live TED API and produced the artifacts in data/sample/.

Sweden — 2,266 notices, 1,398 unique suppliers, 265 ranked

Rank Supplier 12mo 3mo Momentum Composite
1 Peab Sverige AB 6 4 2.67× 0.902
2 Swedbank AB 3 2 2.67× 0.869
3 Securitas Sverige Aktiebolag 4 3 3.00× 0.837
4 Stena Recycling AB 8 4 2.00× 0.831
5 Movab AB 5 3 2.40× 0.812
6 Avarn Security AB 4 2 2.00× 0.812
7 CGI Sverige AB 3 2 2.67× 0.811
8 TN Bygg & Anläggning AB 3 2 2.67× 0.795
9 Anticimex Aktiebolag 7 4 2.29× 0.788
10 OneMed Sverige AB 5 2 1.60× 0.783

These are real, recognizable Nordic suppliers showing actual contract-win momentum: Peab (large-cap construction), Swedbank, Stena group, CGI Sweden, Anticimex (pest control / inspections). #1 Peab has 1.43B SEK of disclosed contract value across 5 distinct public buyers in 12 months — exactly the institutional-depth signal a corp-dev team would surface manually after weeks of work.

Norway — 903 notices, 581 unique suppliers, 92 ranked

Rank Supplier 12mo 3mo Momentum Composite
1 Matriks AS 5 3 2.40× 0.818
2 Atea AS 6 2 1.33× 0.798
3 Crayon AS 5 2 1.60× 0.766
4 Asko Øst AS 3 2 2.67× 0.753
5 AF Energi AS 4 1 1.00× 0.744
6 GK Norge AS 3 1 1.33× 0.740
7 VWR International AS 3 2 2.67× 0.739
8 Schindler AS (hovedenhet) 2 1 2.00× 0.729
9 HENT AS 3 1 1.33× 0.725
10 LÆRE AS 2 2 4.00× 0.721

Norway's procurement volume is meaningfully smaller than Sweden's (roughly 2.5× ratio matches the population + private-sector mix difference). Top Norwegian names skew toward IT services (Atea, Crayon, CGI-class) and infrastructure (HENT, AF Energi, GK Norge).

Per-supplier sourcing memo (data/sample/supplier_memo_top1.md for Sweden's #1 Peab): cadence, scale, sector focus, institutional depth, and a structured "what to validate next" checklist (ownership, financials, sector context, competitive position). Auto-drafted from the same SupplierSnapshot used for ranking — no LLM, no hallucination surface, reproducible.

Bundled artifacts:

Risks & limitations

  • Entity resolution is suffix-strip only. Catches ~80% of trivial variants. Doesn't catch typos, corporate-tree relationships (parent / subsidiary), or abbreviations. GLEIF LEI lookup is the v2 fix for the remaining 20%.
  • TED v3 only carries ~early-2026-onward notices. Older windows are sparse. For a true long-term momentum signal, ingestion from the historical TED CSV archive (2014-2024) is needed.
  • Public procurement is one customer channel, not the P&L. A supplier with strong public-contract momentum may still be losing money in the private-sector book. The sourcing memo explicitly flags this and pushes the user to validate financials separately.
  • Scoring weights are not learned. No M&A outcome dataset is available to train against on public data. The balanced / momentum_led / scale_led profiles are defensible-by-construction heuristics — swap for a trained classifier when labels arrive.
  • Anonymous TED winners (~12% of notices for SE, ~16% for NO) are dropped silently. They represent real economic activity that's invisible to this pipeline.
  • No ownership data yet. A founder-owned or PE-backed signal would be the highest-value filter; deferred to v2 (Bolagsverket integration).
  • Country code mapping: TED uses ISO-3 (SWE, NOR, DNK, FIN); the CLI accepts both 2-letter and 3-letter forms and resolves.

What I'd build next for a real firm

  1. GLEIF LEI integration for verified entity resolution — catches the 20% the suffix-strip can't.
  2. Bolagsverket / Brønnøysund ownership overlay — distinguish founder-owned from PE-backed from corporate-subsidiary, the single highest-value filter for a PE sourcing desk.
  3. Historical TED CSV ingestion (2014-2024) to extend the momentum window from 6 months to 5+ years.
  4. Per-supplier financial pull via local statutory accounts — moves the score from "public-contract momentum" to "public-contract momentum on a healthy P&L."
  5. Sector-specific scoring profiles tuned to the sourcing mandate (e.g. healthcare-services-focused profile up-weights CPV-85 contracts).
  6. CRM hand-off — flagged target → task in the firm's Salesforce.

Reproducibility

# install
python3.12 -m venv .venv && .venv/bin/pip install -e .

# polite User-Agent (TED requires identification)
cp .env.example .env
# edit .env: TED_USER_AGENT="your-project (Your Name your.email@example.com)"
export $(grep -v '^#' .env | xargs)

# scan Sweden (live TED, ~5s for 2.3K notices)
ndsg scan --country SE --months 6 --top 20 \
  --cache-dir data/cache --out-dir data/sample

# scan Norway
ndsg scan --country NO --months 6 --top 20 \
  --cache-dir data/cache --out-dir data/sample/no

# re-score from cached notices (no live TED call)
ndsg scan --country SE --offline \
  --cache-dir data/cache --out-dir /tmp/se_rerun

# scoring profiles
ndsg scan --country SE --profile momentum_led  # 60/20/20 weighting
ndsg scan --country SE --profile scale_led     # 20/60/20

# tests
pytest                       # 67 hermetic tests, no network
pytest -m slow               # adds live-TED integration test

Engineering standards: permissive code (MIT), public data first (TED v3, no key), every claim grounded in the artifact CSVs, entity-resolution honesty surfaced explicitly.

About

Ranks PE / corp-dev sourcing targets from live EU public-procurement (TED v3) data: contract-win momentum + scale + institutional diversity.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages