-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathllms-full.txt
More file actions
188 lines (147 loc) · 6.46 KB
/
Copy pathllms-full.txt
File metadata and controls
188 lines (147 loc) · 6.46 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
# Indra — Full Reference
> Web intelligence that only thinks when the web changes. Bright Data + LLM agent cache — zero tokens on unchanged pages.
## What it does
Indra monitors URLs and SERP queries for content changes. On every run it fetches live via Bright Data (bypassing bot detection and geo-blocks), fingerprints the response, and compares to the last stored hash:
- **No change** → return cached LLM insight instantly. Zero tokens, sub-millisecond.
- **Changed** → extract unified diff, send only the delta to your LLM. Tokens proportional to what changed, not page size.
Over 24 hours of hourly checks across 10 pages: 240 Bright Data fetches, ~20 LLM calls instead of 240.
## Install
```bash
pip install indra-ai
# with Anthropic support
pip install indra-ai[anthropic]
# with OpenAI support
pip install indra-ai[openai]
```
## Environment variables
| Variable | Required | Description |
|---|---|---|
| `BRIGHTDATA_API_KEY` | Yes | Bright Data API key |
| `ANTHROPIC_API_KEY` | No | For Anthropic-based generation_fn |
## API
### `indra.init()`
```python
agent = indra.init(
brightdata_api_key="...", # or set BRIGHTDATA_API_KEY env var
db_path="indra.db", # SQLite path for snapshots and insights
unlocker_zone=None, # custom Bright Data zone name
serp_zone=None, # custom Bright Data SERP zone name
silent=False, # suppress per-URL console output
)
```
Returns a singleton `Indra` instance. Calling `init()` again returns the same instance.
### `agent.watch(url, question, generation_fn)`
Watch a single URL. Returns a `WatchResult`.
```python
result = agent.watch(
url="https://example.com",
question="What changed and why does it matter?",
generation_fn=my_llm_fn, # fn(prompt: str) -> str — only called on change
render_js=False, # True for JS-heavy pages (Bright Data headless)
ttl=3600, # skip Bright Data fetch if snapshot < ttl seconds old
)
```
`generation_fn` is only called when content has changed since the last run. Signature: `fn(prompt: str) -> str`. Pass any LLM wrapper — Anthropic, OpenAI, local model, etc.
### `agent.watch_all(urls, question, generation_fn)`
Watch multiple URLs in one call. Returns `List[WatchResult]`.
```python
results = agent.watch_all(
urls=["https://site1.com", "https://site2.com"],
question="Any significant changes?",
generation_fn=my_llm_fn,
render_js=False,
ttl=None,
)
```
### `agent.search_watch(query, question, generation_fn)`
Watch live SERP results for a search query. Fires LLM only when the result set changes.
```python
result = agent.search_watch(
query="openai new model announcement",
question="Is there a major new release?",
generation_fn=my_llm_fn,
num_results=10,
)
```
### `agent.stats()` / `agent.print_stats()`
```python
s = agent.stats()
# {
# "brightdata_fetches": 24,
# "llm_calls_fired": 3,
# "cache_hits": 21,
# "changes_detected": 3,
# "tokens_saved": 31500,
# "cost_saved_usd": 0.0945,
# "efficiency_pct": 87,
# }
agent.print_stats()
# ──────────────────────────────────────────────────
# Indra Session Summary
# ──────────────────────────────────────────────────
# Bright Data fetches : 24
# Changes detected : 3
# LLM calls fired : 3
# Cache hits : 21
# Tokens saved : 31,500
# Cost saved : $0.0945
# Efficiency : 87%
# ──────────────────────────────────────────────────
```
### `WatchResult` fields
| Field | Type | Description |
|---|---|---|
| `changed` | bool | Whether content changed since last run |
| `insight` | str | LLM analysis, or cached answer if unchanged |
| `diff` | str | Unified diff of what changed (empty if no change) |
| `tokens_saved` | int | Tokens skipped this run |
| `cost_saved_usd` | float | Dollar value of skipped tokens |
| `latency_ms` | float | Total time for this watch call |
| `brightdata_called` | bool | Whether Bright Data was queried |
| `change_count` | int | Total times this URL has changed |
| `summary` | str | Human-readable change summary |
## Architecture
```
Your agent / script
│ agent.watch(url, question)
▼
Indra
1. Fetch via Bright Data Web Unlocker
(bypasses bot detection, geo-blocks)
2. Fingerprint content (SHA-256)
Compare to stored hash in SQLite
3a. No change → return cached insight
0 tokens · sub-millisecond
3b. Changed → extract diff → LLM(diff only)
tokens ∝ what changed, not page size
```
Indra stores snapshots and insights in a local SQLite database. The LLM insight cache is powered by [Mnemon](https://github.com/smartass-4ever/Mnemon) — identical prompts on identical diffs never hit the LLM twice, even across restarts.
## Use cases
- **Competitor pricing monitor** — check 20 competitor pages every hour; LLM summarises only when a price changes
- **News and signal tracker** — watch industry news; alert only on genuinely new stories
- **Supply chain watcher** — monitor supplier pages for stock or lead time changes; zero noise on stable days
- **Regulatory tracker** — watch government or compliance pages; LLM fires when policy text changes
- **SEO and ranking monitor** — SERP watch for branded or competitive queries; analyse only when rankings shift
## Dependencies
| Package | Role |
|---|---|
| `requests` | HTTP fallback and Bright Data API calls |
| `numpy` | Similarity scoring for change detection |
| `mnemon` | LLM insight cache (bundled) |
| `anthropic` *(optional)* | Anthropic SDK for generation_fn |
| `openai` *(optional)* | OpenAI SDK for generation_fn |
## Source layout
```
indra/
__init__.py — Indra class, WatchResult, init(), get()
cli.py — CLI entry point (indra command)
demo.py — interactive demo runner
web/
brightdata.py — BrightDataClient (fetch + search)
change_detector.py — fingerprint, diff, summarise_change
store.py — WebSnapshotStore (SQLite)
examples/
competitor_monitor_demo.py
```
## License
MIT. Built for the [Web Data UNLOCKED Hackathon](https://lablab.ai/ai-hackathons/brightdata-ai-agents-web-data-hackathon) by [Mahika Jadhav](https://github.com/smartass-4ever).