Please assume all requests made are traceable back to you, unless you have taken stringent OPSEC measures (doubtful)
- Have a recent version of Python installed on your system
- Have Tor binary installed on your system
- Have GnuPG (gpg) installed on your system (for pgp signature validation, not strictly necessary)
- For Erebus/Mnemosyne: Run in a VM (use Whonix for best security) or VPS, connect to a VPN before searching/scanning
python d_erebus.py --pages 5 --save-j
python d_hemera.py Erebus_<query>_<timestamp>.json
python d_mnemosyne.py --batch erebus_urls_<query>_<timestamp>.txt --save
python d_khaeos.py --ingest Mnemosyne_batch_.json --serve- Erebus scrapes the dark net. It can certainly be used as a stand-alone tool.
- Hemera ingests the JSON output of Erebus, and outputs a clean .txt list of the URLs found.
- Mnemosyne conducts a pre-visit security analysis of .onion addresses it is fed. It is a standalone tool.
- Mnemosyne can also be ran in batch mode with the .txt output of Hemera, producing a group summary.
- Khaeos ingests single-scan and batch Mnemosyne JSON files, building a custom intelligence index.
pip install 'requests[socks]' stem beautifulsoup4 rich
- Tor binary —
brew install tor(macOS) orsudo apt install tor(Debian/Ubuntu)
python d_erebus.py
python d_erebus.py --pages <N> # pages to fetch per index (1-10, default 1, more = slower)
python d_erebus.py --save-j # save all results to JSON
python d_erebus.py --save-h # save all results to HTML (rich table)
python d_erebus.py --no-save # skip all save prompts
python d_erebus.py --debug # verbose error output and raw HTML previews| Index | Type | Reliability (1(worst)-10(best)) |
|---|---|---|
| Ahmia | Structured, large index | 10 |
| Torch | Classic Tor search engine | 1.5 |
| Tor66 | Broad dark web index | 8 |
| notevil | Supplementary index | 4 |
| Amnesia | Supplementary index | 9 |
Each index fails gracefully with an informative error if unreachable, without affecting the others.
Update the .onion links in d_erebus.py if they fail.
-
Tor management — Detects existing daemon on
127.0.0.1:9050or launches a managed process. Waits for full bootstrap before any requests. -
Circuit rotation — requests a fresh
NEWNYMcircuit between each index query so each search engine sees a different Tor identity. -
Reachability check — probes each index before attempting to scrape. Unreachable indexes are skipped immediately with a clear error rather than timing out mid-scrape.
-
All traffic over Tor —
socks5h://proxying throughout, DNS resolution inside Tor. No clearnet requests at any point. -
Deduplication — results are grouped by canonical
.onionhost. The same host appearing at multiple paths across multiple indexes is collapsed into one entry with branched paths shown as a readable list. Known link farm addresses (configurable blocklist inLINK_FARM_BLOCKLIST) are flagged with⚑but not discarded. -
Output — rich-formatted CLI table showing top 50 results, sorted by cross-index corroboration. Full result set can be saved to a timestamped JSON file.
- CLI — top 50 results shown in a rich table with title, canonical host, branched paths, source indexes, and snippet
- JSON — full result set saved as
Erebus_{query}_{timestamp}.jsonin the current directory - HTML — full result set saved as
Erebus_{query}_{timestamp}.htmlin the current directory, rich-formatted table with color and full row display (requiresrich)
Amnesia was experiencing issues causing it to be unreachable for this query
- Tor must be installed on the system. The tool will exit with a clear error message and install instructions if the binary is not found.
- .onion search indexes go offline and change addresses frequently. If an index consistently fails, verify its current address and update
INDEXESin the file. - Pagination can significantly increase runtime.
- Add known link farm
.onionhostnames toLINK_FARM_BLOCKLISTat the top of the file as you encounter them. - Do not enter sensitive information as a search term. Search engines may (probably) log searches. See disclaimer above.
- Torch runs on CGI architecture. An apt description of CGI can be found on the closest thing to official CGI documentation that exists (https://www.w3.org/CGI/): "left here for historical purposes". But they use it and it fails constantly
Parses a d_erebus.py JSON output file and extracts all .onion URLs for batch scanning with d_mnemosyne.py.
No additional dependencies beyond the Python standard library.
python d_hemera.py <Erebus_json>
python d_hemera.py <Erebus_json> --output <file> # custom output filename
python d_hemera.py <Erebus_json> --full-paths # emit every unique path per host, not just canonical roots
python d_hemera.py <Erebus_json> --include-farms # include link-farm flagged hosts (not recommended, see notes)
python d_hemera.py <Erebus_json> --no-save # print URLs to stdout only, skip file write-
JSON parsing — loads and validates a
Erebus_*.jsonfile produced byd_erebus.py. Exits with a clear error if the file is missing, malformed, or not an Erebus output file. -
Deduplication — in canonical root mode (default), only one URL per
.onionhost is emitted regardless of how many paths were seen. In--full-pathsmode, every unique URL found across all paths is included, deduplicated by exact URL. -
Link farm filtering — hosts flagged as link farms by Erebus are silently excluded by default.
--include-farmsoverrides this with a printed warning, as these hosts produce low-signal results and waste significant scan time. -
Output — prints the extracted list to the terminal with an index number per URL, then writes a timestamped
.txtfile ready to be passed directly tod_mnemosyne.py --batch.
- CLI — numbered list of extracted URLs with a summary showing total results, farms skipped, and URLs extracted
- TXT — URL list saved as
erebus_urls_{query}_{timestamp}.txtin the current directory, one URL per line
Spacer to mitigate the awful misalignment
- Input must be a JSON file saved by
d_erebus.py. Other JSON files of different structure will be rejected. - Canonical root mode (
http://host/) is recommended for batch scanning — it avoids redundant scans of the same service at different paths. --include-farmsis available but will likely inflate scan time significantly with low-value targets. Use it if you have a specific reason to.- The output
.txtfile can be edited manually before passing tod_mnemosyne.py— remove any URLs you want to skip, add comments with#.
Use before visiting .onion sites or services. Will analyze the address without any direct browser interaction.
Fetches only raw HTTP/HTML over a Tor circuit, no JavaScript execution, no image loading, no cookies.
pip install 'requests[socks]' stem beautifulsoup4 python-gnupg
These are also in
requirements.txt, can be installed withpip install -r requirements.txtafter cloning repo
- Tor binary —
brew install tor(macOS) orsudo apt install tor(Debian/Ubuntu) - gpg binary — required for PGP canary verification only (You should have this anyways if using Tor)
python d_mnemosyne.py
python d_mnemosyne.py --save # If not passed you will be prompted to save after scan
python d_mnemosyne.py --debug # Prints more verbose error output
python d_mnemosyne.py --batch <url_list.txt> # Multiple link batch scanExit codes (scriptable) (might not be working properly atm):
0— LOW risk1— MEDIUM risk2— HIGH risk3— CRITICAL risk
-
Tor management — uses
stemto detect an existing Tor daemon on127.0.0.1:9050. If none is found, launches a managed Tor process from the system binary with a temporary data directory. Waits for full bootstrap before proceeding. Requests a fresh circuit viaNEWNYMbefore each scan. The managed process is killed cleanly on exit (or will inform otherwise). -
All traffic routed through Tor — the
requestssession usessocks5h://proxying, meaning hostname resolution happens inside Tor. This means the .onion address never touches your network directly. -
Reputation check — fetches the Ahmia abuse blacklist over Tor at startup. The target host is hashed and compared against the list, will inform of known addresses flagged for abusive or illegal content.
-
Static HTML analysis only — the root page is fetched once and parsed with BeautifulSoup. No JavaScript is executed. Checks performed on the static HTML include: script tag enumeration, external resource leak detection, inline event handler counting, form analysis, and fingerprinting vector pattern matches (WebRTC, canvas, AudioContext, ...).
-
Well-known file probing — fetches
/canary.txt,/pgp.txt,/.well-known/pgp-key.txt,/security.txt,/robots.txt, and/sitemap.xmlover Tor. -
PGP canary verification — if a clearsigned canary and a PGP public key are both found, verifies the signature using
gpgin an isolated temporary keyring. Will flag fingerprint mismatches as potential key substitution attacks. -
Risk scoring — weighted passive score (0–100) based on: clearnet redirects, external resource leaks, clearnet form actions, clearnet script sources, canary verification failures, missing security headers, fingerprinting vectors, and blacklist status.
-
Batch Scan - Reads .onion URLs from a text file (one per line), running a sequential scan of all targets over a single Tor session. Output is a grouped summary report. JSON (--save) produces a single consolidated file instead of per-target files. Generate the input file with: python d_erebus.py --save-j
Some services will show unreachable, blacklisted addresses highlighted. For reference this batch scan of 310 URLs took around 1 hour
- Tor must be installed on the system. The tool will exit with a clear error message and install instructions if the binary is not found.
- Self-signed certs are common on .onion services and aren't treated as a risk signal — TLS verification is disabled for .onion targets since the v3 address itself is a cryptographic identity.
- .onion services can be significantly slower than clearnet. The default request timeout is 40 seconds. Scans may take several minutes on slow hidden services.
- No browser is opened at any point, JavaScript is never executed, because of this the tool cannot assess dynamic behavior, only what is present in the static HTML response (this is usually still quite informative).
Accept-Encodingis intentionally excluded from request headers to prevent compressed binary responses.- well-known/ OMG guideline file scan probings are really only insightful if scanning root domain path e.g. / not /news
Consumes Mnemosyne JSON output and builds a queryable local database with full scan history per site.
pip install fastapi uvicorn
python d_khaeos.py # start the web UI (localhost:7777)
python d_khaeos.py --ingest <Mnemosyne_file.json> # ingest a file, then exit
python d_khaeos.py --ingest <Mnemosyne_file.json> --serve # ingest then start the UI
python d_khaeos.py --db <path/to/khaeos.db> # use a custom database pathBoth single-scan (Mnemosyne_<host>.json) and batch (Mnemosyne_batch_<timestamp>.json) output files are supported. Duplicate scans (same host + scan time) are silently skipped on re-ingest.
-
Ingest — parses Mnemosyne JSON (single or batch), extracts all reachability, risk, header, canary, script, and resource data into a local SQLite database. Each scan is stored in full with key fields also extracted as queryable columns. Batch files are tracked as named batches with aggregate statistics.
-
Site profiles — one row per canonical
.onionhost, updated on every ingest. Tracks uptime percentage, rolling average latency, risk score history, blacklist status, canary validity streak, clearnet leak flags, and security header counts across all scans. -
Time-series tracking — every scan is stored individually, allowing per-site charts of reachability, latency, risk score, security headers, and script count over time; this allows visibility of risk score drift between scans.
-
Category inference — site titles and snippets are matched against keyword lists to automatically assign categories (forum, market, chat, email, news, wiki, leaks, crypto, hacking, hosting, social, search, privacy, services). Category can be overridden manually via the UI.
-
Safety-focused search — results can be filtered by risk level, reachability, blacklist status, clearnet leak presence, canary validity, and category simultaneously. Ranking surfaces trust signals rather than relevance.
-
Persistent storage — all data lives in a single SQLite file (
khaeos.dbby default). Point--dbat any path including an external drive to store the database wherever you want. The database grows indefinitely — scans accumulate forever and nothing is overwritten.
The UI opens automatically in your browser at http://127.0.0.1:7777 when the server starts. It consists of four tabs:
| Tab | Description |
|---|---|
| Index | Searchable, filterable site list with per-site security badges, uptime, and latency. Click any entry for full detail view with time-series charts. |
| Overview | Dashboard showing risk distribution, sites by category, batch comparison charts, and batch trend lines across ingests. |
| Terminal | Command-line interface for database queries and management. |
| Journal | Persistent canvas for freehand notes, labeled boxes, and directional connectors. Auto-saves every 60 seconds. |
The database uses SQLite with WAL mode enabled. Three tables:
| Table | Description |
|---|---|
sites |
One row per canonical host. Aggregate stats updated on every ingest. |
scans |
One row per scan run. Full JSON blob plus extracted columns for fast filtering. |
batches |
One row per batch ingest. Aggregate stats per batch for trend analysis. |
The database file is portable — copy it to back it up, move it between machines, or open it directly with any SQLite browser. To use an external drive:
python d_khaeos.py --db /Path/To/Drive/khaeos_folder/khaeos.dbCan add a shell alias so the path is never forgotten:
alias khaeos='python /path/to/d_khaeos.py --db /Path/To/Drive/khaeos_folder/khaeos.db'- Khaeos does not scan anything. It is purely a storage and visualization layer. All scanning is performed by Mnemosyne. It is a local web app not exposed to the internet.
- The database accumulates scan history indefinitely. There is no automatic pruning. Disk usage grows proportionally to the number of sites and scan frequency.
- Batch aggregate statistics (average risk score, reachability, etc.) are computed at ingest time and are not recalculated if individual scans are later deleted via
db --delete-row(Terminal command). khaeos_journal.jsonis saved alongsidekhaeos.dband persists the Journal canvas across sessions, both files should be included in any backup of the database directory.
pgp_verify.py
- Authenticate PGP signatures (read comments in file) (or just try it out it's relatively self-explanatory)














