Ranks acquisition / investment targets from public Nordic & EU procurement data — by contract-win momentum, scale, and institutional diversity — and drafts a first sourcing memo per top candidate. A rifle scope, not a CSV dump.
PE, corp dev, and private-credit teams all fight the same fight: find the target before the market prices it. EQT built Motherbrain to do exactly this and treats an 18–24 month tech lead as a moat. This repo is the Nordic angle that most candidates never touch — it fuses public data sources few people use well into a ranked, explainable target list. That's the differentiator between "another AI builder with American CSVs" and someone who speaks the local market.
-
Cao et al. (EQT Motherbrain), "Beyond Gut Feel," ICANN 2024 — framing deal sourcing as multivariate time-series classification of company trajectories.
-
ML-in-M&A thesis (Aalto, surfaced): Predicting Merger and Acquisition Outcomes: A Machine Learning Approach — aaltodoc
-
Government-contract signal literature — public procurement carries both a pricing signal and an information signal about firm health and momentum.
-
Master's thesis to reverse-engineer (surfaced — read & vet next): Aalto, Predicting venture-capital-backed start-up success with machine learning — aaltodoc
Math we're stealing: the trajectory features and classifier that score a company's odds of IPO/acquisition — Motherbrain's "Beyond Gut Feel" idea, made reproducible on public data.
-
Frontier to push beyond (2025–2026): the LLM multi-agent investment-analysis wave (e.g. arXiv 2602.00082) applied to public-data target ranking.
Hypothesis: contract-win momentum, scale, and institutional breadth are leading, public, and underused signals for sourcing — legible long before a banker's teaser lands in the inbox.
| Source | Provides | Cost | License caveat |
|---|---|---|---|
| TED v3 Search API | EU public-contract notices | Free, no key | Open. Polite User-Agent required (sets TED_USER_AGENT). |
| Bolagsverket open data | Swedish ownership / financials | Free | Open (roadmap — manual validation today.) |
| GLEIF | LEI entity identifiers | Free | Open (roadmap — for ER beyond suffix-stripping.) |
TED v3 coverage caveat: the v3 Search API only carries notices from ~early 2026 onward, due to the EU's eForms transition. Older windows return sparse data. The MVP defaults to a 6-month look-back for this reason.
flowchart LR
A[TED v3<br/>award notices] --> B[TEDClient<br/>iteration-token pagination]
B --> C[Notice model<br/>buyer, winner, value, CPV, date]
C --> D[normalize_supplier_name<br/>Nordic suffix strip]
D --> E[aggregate_suppliers<br/>group by canonical name]
E --> F[SupplierSnapshot<br/>features per supplier]
F --> G[score_supplier<br/>momentum + scale + diversity]
G --> H[Composite rank<br/>balanced/momentum_led/scale_led]
H --> I[CSV + PNG + per-supplier memo]
Two disciplines do most of the work:
-
Entity resolution that's honest about its limits. Nordic legal suffixes (
AB,Aktiebolag,Oy,Oyj,A/S,ApS,ASA,SE,GmbH, ...) get stripped on a word-boundary regex; case + whitespace get normalised.Volvo AB,VOLVO AB, andVolvo Aktiebolagcollapse to one supplier.Volvo Cars ABandVolvo Trucks ABstay separate — they're distinct legal entities, and merging them would be wrong. Typo merging and corporate-tree resolution wait for GLEIF LEI integration in v2. -
Scoring weights are defensible by construction, not by fit. No M&A label dataset exists to train a classifier against, so we don't pretend to. The three sub-scores (momentum, scale, diversity) are percentile-ranked within the universe and combined with documented weights per profile (
balanced,momentum_led,scale_led). Easy to inspect; easy to defend; easy to swap for a real model when labels arrive.
Live scans against TED v3, 6-month look-back, balanced scoring profile.
Both runs hit the live TED API and produced the artifacts in
data/sample/.
| Rank | Supplier | 12mo | 3mo | Momentum | Composite |
|---|---|---|---|---|---|
| 1 | Peab Sverige AB | 6 | 4 | 2.67× | 0.902 |
| 2 | Swedbank AB | 3 | 2 | 2.67× | 0.869 |
| 3 | Securitas Sverige Aktiebolag | 4 | 3 | 3.00× | 0.837 |
| 4 | Stena Recycling AB | 8 | 4 | 2.00× | 0.831 |
| 5 | Movab AB | 5 | 3 | 2.40× | 0.812 |
| 6 | Avarn Security AB | 4 | 2 | 2.00× | 0.812 |
| 7 | CGI Sverige AB | 3 | 2 | 2.67× | 0.811 |
| 8 | TN Bygg & Anläggning AB | 3 | 2 | 2.67× | 0.795 |
| 9 | Anticimex Aktiebolag | 7 | 4 | 2.29× | 0.788 |
| 10 | OneMed Sverige AB | 5 | 2 | 1.60× | 0.783 |
These are real, recognizable Nordic suppliers showing actual contract-win momentum: Peab (large-cap construction), Swedbank, Stena group, CGI Sweden, Anticimex (pest control / inspections). #1 Peab has 1.43B SEK of disclosed contract value across 5 distinct public buyers in 12 months — exactly the institutional-depth signal a corp-dev team would surface manually after weeks of work.
| Rank | Supplier | 12mo | 3mo | Momentum | Composite |
|---|---|---|---|---|---|
| 1 | Matriks AS | 5 | 3 | 2.40× | 0.818 |
| 2 | Atea AS | 6 | 2 | 1.33× | 0.798 |
| 3 | Crayon AS | 5 | 2 | 1.60× | 0.766 |
| 4 | Asko Øst AS | 3 | 2 | 2.67× | 0.753 |
| 5 | AF Energi AS | 4 | 1 | 1.00× | 0.744 |
| 6 | GK Norge AS | 3 | 1 | 1.33× | 0.740 |
| 7 | VWR International AS | 3 | 2 | 2.67× | 0.739 |
| 8 | Schindler AS (hovedenhet) | 2 | 1 | 2.00× | 0.729 |
| 9 | HENT AS | 3 | 1 | 1.33× | 0.725 |
| 10 | LÆRE AS | 2 | 2 | 4.00× | 0.721 |
Norway's procurement volume is meaningfully smaller than Sweden's (roughly 2.5× ratio matches the population + private-sector mix difference). Top Norwegian names skew toward IT services (Atea, Crayon, CGI-class) and infrastructure (HENT, AF Energi, GK Norge).
Per-supplier sourcing memo (data/sample/supplier_memo_top1.md
for Sweden's #1 Peab): cadence, scale, sector focus, institutional
depth, and a structured "what to validate next" checklist (ownership,
financials, sector context, competitive position). Auto-drafted from
the same SupplierSnapshot used for ranking — no LLM, no hallucination
surface, reproducible.
Bundled artifacts:
data/sample/ranked_targets.csv— Sweden top-20data/sample/ranked_targets.png— Sweden bar chartdata/sample/supplier_memo_top1.md— Peab memodata/sample/no/— same set, Norway market
- Entity resolution is suffix-strip only. Catches ~80% of trivial variants. Doesn't catch typos, corporate-tree relationships (parent / subsidiary), or abbreviations. GLEIF LEI lookup is the v2 fix for the remaining 20%.
- TED v3 only carries ~early-2026-onward notices. Older windows are sparse. For a true long-term momentum signal, ingestion from the historical TED CSV archive (2014-2024) is needed.
- Public procurement is one customer channel, not the P&L. A supplier with strong public-contract momentum may still be losing money in the private-sector book. The sourcing memo explicitly flags this and pushes the user to validate financials separately.
- Scoring weights are not learned. No M&A outcome dataset is available to train against on public data. The balanced / momentum_led / scale_led profiles are defensible-by-construction heuristics — swap for a trained classifier when labels arrive.
- Anonymous TED winners (~12% of notices for SE, ~16% for NO) are dropped silently. They represent real economic activity that's invisible to this pipeline.
- No ownership data yet. A founder-owned or PE-backed signal would be the highest-value filter; deferred to v2 (Bolagsverket integration).
- Country code mapping: TED uses ISO-3 (
SWE,NOR,DNK,FIN); the CLI accepts both 2-letter and 3-letter forms and resolves.
- GLEIF LEI integration for verified entity resolution — catches the 20% the suffix-strip can't.
- Bolagsverket / Brønnøysund ownership overlay — distinguish founder-owned from PE-backed from corporate-subsidiary, the single highest-value filter for a PE sourcing desk.
- Historical TED CSV ingestion (2014-2024) to extend the momentum window from 6 months to 5+ years.
- Per-supplier financial pull via local statutory accounts — moves the score from "public-contract momentum" to "public-contract momentum on a healthy P&L."
- Sector-specific scoring profiles tuned to the sourcing mandate (e.g. healthcare-services-focused profile up-weights CPV-85 contracts).
- CRM hand-off — flagged target → task in the firm's Salesforce.
# install
python3.12 -m venv .venv && .venv/bin/pip install -e .
# polite User-Agent (TED requires identification)
cp .env.example .env
# edit .env: TED_USER_AGENT="your-project (Your Name your.email@example.com)"
export $(grep -v '^#' .env | xargs)
# scan Sweden (live TED, ~5s for 2.3K notices)
ndsg scan --country SE --months 6 --top 20 \
--cache-dir data/cache --out-dir data/sample
# scan Norway
ndsg scan --country NO --months 6 --top 20 \
--cache-dir data/cache --out-dir data/sample/no
# re-score from cached notices (no live TED call)
ndsg scan --country SE --offline \
--cache-dir data/cache --out-dir /tmp/se_rerun
# scoring profiles
ndsg scan --country SE --profile momentum_led # 60/20/20 weighting
ndsg scan --country SE --profile scale_led # 20/60/20
# tests
pytest # 67 hermetic tests, no network
pytest -m slow # adds live-TED integration testEngineering standards: permissive code (MIT), public data first (TED v3, no key), every claim grounded in the artifact CSVs, entity-resolution honesty surfaced explicitly.