Skip to content

Commit dabeac6

Browse files
committed
Refresh docs for trust-oriented enrichment output
1 parent 92915ec commit dabeac6

3 files changed

Lines changed: 161 additions & 14 deletions

File tree

README.md

Lines changed: 24 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,9 @@ It helps with the messy middle of outbound research:
1010

1111
- identify the likely official website
1212
- pull visible contact paths from public pages
13-
- capture a short factual company summary
13+
- rank contacts by outreach usefulness
14+
- attach provenance to extracted fields
15+
- expose trust signals instead of hiding everything behind one score
1416
- package that into a compact JSON dossier
1517
- turn the dossier into a restrained outreach draft
1618

@@ -50,9 +52,11 @@ python3 skill/scripts/enrich_lead.py --company "Mistral AI" --domain mistral.ai
5052
What to check in `dossier.json` before trusting it:
5153

5254
- `primary_domain` looks official
55+
- `site_verification.verified` is true or at least plausible
5356
- `summary` is coherent
54-
- `emails` or `contact_pages` are plausible
55-
- `warnings` do not show fetch/search failure or obvious mismatch
57+
- `best_contact_email` is not obviously a weak target like `press@` or `privacy@`
58+
- `summary_source`, `email_sources`, and `phone_sources` point to believable pages
59+
- `trust_signals` and `warnings` match your own intuition about the result
5660

5761
### 2) Generate one draft
5862

@@ -63,12 +67,27 @@ python3 skill/scripts/generate_outreach.py dossier.json \
6367

6468
The output is a first draft, not send-ready copy.
6569

70+
## Output shape
71+
72+
The enrichment output now includes trust-oriented fields such as:
73+
74+
- `site_verification`
75+
- `best_contact_email`
76+
- `best_contact_source`
77+
- `summary_source`
78+
- `email_sources`
79+
- `phone_sources`
80+
- `trust_signals`
81+
- `warnings`
82+
83+
This makes it easier to inspect not just *what* was found, but also *why it should or should not be trusted*.
84+
6685
## Examples
6786

6887
The `examples/` directory is curated to stay believable in public:
6988

7089
- `demo-leads.csv` — tiny batch input
71-
- `demo-output.json` — trimmed dossier examples with noisy scrape artifacts removed
90+
- `demo-output.json` — trimmed dossier examples with trust-oriented fields visible
7291
- `openai-dossier.json` — single credible dossier example
7392
- `openai-draft.json` — restrained draft example
7493
- `ab-report.json` — illustrative only, not benchmark evidence
@@ -87,6 +106,7 @@ This repo uses public web results and intentionally lightweight heuristics. Expe
87106
- sites that block fetching
88107
- noisy phones/emails from raw HTML
89108
- directories outranking the official site
109+
- weak but official contacts such as `press@` or `privacy@`
90110
- summaries that still need human cleanup
91111

92112
If the dossier looks wrong, treat it as wrong.

examples/demo-output.json

Lines changed: 90 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,38 @@
55
"query": "Mistral AI",
66
"primary_domain": "mistral.ai",
77
"website_title": "Frontier AI LLMs, assistants, agents, services | Mistral AI",
8-
"summary": "Frontier AI. We help organizations build tailored AI systems to solve the world’s hardest problems. The site routes buyers to product, research, and contact pages rather than generic directories.",
8+
"site_verification": {
9+
"verified": true,
10+
"score": 3.25,
11+
"title": "Frontier AI LLMs, assistants, agents, services | Mistral AI",
12+
"reason": null
13+
},
14+
"summary": "Frontier AI. Mistral AI presents language models, assistants, and enterprise AI services with clear buyer-facing product and contact paths.",
15+
"summary_source": {
16+
"source_url": "https://mistral.ai",
17+
"source_type": "page"
18+
},
919
"emails": [
1020
"press@mistral.ai",
1121
"privacy@mistral.ai"
1222
],
23+
"email_sources": {
24+
"press@mistral.ai": {
25+
"source_url": "https://mistral.ai/contact",
26+
"source_type": "page"
27+
},
28+
"privacy@mistral.ai": {
29+
"source_url": "https://mistral.ai/privacy",
30+
"source_type": "page"
31+
}
32+
},
33+
"best_contact_email": "press@mistral.ai",
34+
"best_contact_source": {
35+
"source_url": "https://mistral.ai/contact",
36+
"source_type": "page"
37+
},
1338
"phones": [],
39+
"phone_sources": {},
1440
"contact_pages": [
1541
"https://mistral.ai/contact",
1642
"https://mistral.ai/about"
@@ -19,31 +45,87 @@
1945
"https://www.linkedin.com/company/mistralai/"
2046
],
2147
"snippets": [],
22-
"confidence": 0.8,
23-
"warnings": []
48+
"confidence": 0.6,
49+
"trust_signals": {
50+
"has_domain": true,
51+
"has_summary": true,
52+
"email_count": 2,
53+
"phone_count": 0,
54+
"warning_count": 1,
55+
"warning_penalty": 0.05,
56+
"site_verified": true,
57+
"site_verification_score": 3.25,
58+
"best_contact": {
59+
"present": true,
60+
"official": true,
61+
"strong": false,
62+
"weak": true,
63+
"tier": "official_weak"
64+
}
65+
},
66+
"warnings": [
67+
"Best available email looks weak for outreach: press@mistral.ai"
68+
]
2469
},
2570
{
2671
"company": "DeepL",
2772
"region": null,
2873
"query": "DeepL",
2974
"primary_domain": "deepl.com",
3075
"website_title": "DeepL AI Platform: Translation, Voice & API",
76+
"site_verification": {
77+
"verified": true,
78+
"score": 2.75,
79+
"title": "DeepL AI Platform: Translation, Voice & API",
80+
"reason": null
81+
},
3182
"summary": "DeepL positions itself as an AI language platform spanning translation, voice, and API products. The public site exposes multiple buyer-facing contact paths, including sales, support, and a general contact page.",
83+
"summary_source": {
84+
"source_url": "https://www.deepl.com",
85+
"source_type": "page"
86+
},
3287
"emails": [
3388
"support@deepl.com"
3489
],
90+
"email_sources": {
91+
"support@deepl.com": {
92+
"source_url": "https://www.deepl.com/contact",
93+
"source_type": "page"
94+
}
95+
},
96+
"best_contact_email": "support@deepl.com",
97+
"best_contact_source": {
98+
"source_url": "https://www.deepl.com/contact",
99+
"source_type": "page"
100+
},
35101
"phones": [],
102+
"phone_sources": {},
36103
"contact_pages": [
37-
"https://deepl.com/en/contact-us",
104+
"https://www.deepl.com/contact",
38105
"https://support.deepl.com/hc/en-us"
39106
],
40107
"social_links": [
41-
"https://www.linkedin.com/company/linkedin-com-company-deepl/"
108+
"https://www.linkedin.com/company/deepl/"
42109
],
43110
"snippets": [],
44111
"confidence": 0.7,
45-
"warnings": [
46-
"trimmed noisy contacts from raw scrape for this public example"
47-
]
112+
"trust_signals": {
113+
"has_domain": true,
114+
"has_summary": true,
115+
"email_count": 1,
116+
"phone_count": 0,
117+
"warning_count": 0,
118+
"warning_penalty": 0.0,
119+
"site_verified": true,
120+
"site_verification_score": 2.75,
121+
"best_contact": {
122+
"present": true,
123+
"official": true,
124+
"strong": true,
125+
"weak": false,
126+
"tier": "official_strong"
127+
}
128+
},
129+
"warnings": []
48130
}
49131
]

examples/openai-dossier.json

Lines changed: 47 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -4,12 +4,38 @@
44
"query": "Mistral AI",
55
"primary_domain": "mistral.ai",
66
"website_title": "Frontier AI LLMs, assistants, agents, services | Mistral AI",
7+
"site_verification": {
8+
"verified": true,
9+
"score": 3.25,
10+
"title": "Frontier AI LLMs, assistants, agents, services | Mistral AI",
11+
"reason": null
12+
},
713
"summary": "Mistral AI presents frontier language models, assistants, and enterprise AI services with clear product and contact paths for prospective buyers.",
14+
"summary_source": {
15+
"source_url": "https://mistral.ai",
16+
"source_type": "page"
17+
},
818
"emails": [
919
"press@mistral.ai",
1020
"privacy@mistral.ai"
1121
],
22+
"email_sources": {
23+
"press@mistral.ai": {
24+
"source_url": "https://mistral.ai/contact",
25+
"source_type": "page"
26+
},
27+
"privacy@mistral.ai": {
28+
"source_url": "https://mistral.ai/privacy",
29+
"source_type": "page"
30+
}
31+
},
32+
"best_contact_email": "press@mistral.ai",
33+
"best_contact_source": {
34+
"source_url": "https://mistral.ai/contact",
35+
"source_type": "page"
36+
},
1237
"phones": [],
38+
"phone_sources": {},
1339
"contact_pages": [
1440
"https://mistral.ai/contact",
1541
"https://mistral.ai/about"
@@ -18,6 +44,25 @@
1844
"https://www.linkedin.com/company/mistralai/"
1945
],
2046
"snippets": [],
21-
"confidence": 0.8,
22-
"warnings": []
47+
"confidence": 0.6,
48+
"trust_signals": {
49+
"has_domain": true,
50+
"has_summary": true,
51+
"email_count": 2,
52+
"phone_count": 0,
53+
"warning_count": 1,
54+
"warning_penalty": 0.05,
55+
"site_verified": true,
56+
"site_verification_score": 3.25,
57+
"best_contact": {
58+
"present": true,
59+
"official": true,
60+
"strong": false,
61+
"weak": true,
62+
"tier": "official_weak"
63+
}
64+
},
65+
"warnings": [
66+
"Best available email looks weak for outreach: press@mistral.ai"
67+
]
2368
}

0 commit comments

Comments
 (0)