Oblivion Experiments

This directory contains benchmark evaluation scripts and ablation experiments for the Oblivion memory framework.

Benchmarks

GoodAI-LTM Benchmark

The GoodAI Long-Term Memory Benchmark (Castillo-Bolado et al., 2024) is a dynamic benchmark with 33 test cases across 7 categories. It evaluates memory systems by interleaving test content with filler trivia tokens, simulating realistic long-term memory pressure at configurable context lengths (1K–500K tokens).

# Install dependencies
poetry install --extras "goodai-benchmark"

# Run Oblivion agent on 32K benchmark
python -m experiments.goodai_ltm_benchmark.run_benchmark \
    benchmark=32k agent=oblivion

# Run Vanilla LLM baseline
python -m experiments.goodai_ltm_benchmark.run_benchmark \
    benchmark=32k agent=vanilla_llm

See goodai_ltm_benchmark/README.md for detailed usage, example commands, configuration parameters, and hyperparameter sweep documentation.

LongMemEval Benchmark

LongMemEval (Wu et al., 2025) is a static benchmark with 500 test cases across 6 categories, evaluating long-term conversational memory through oracle and systematic splits. The evaluation pipeline in this repository is a custom implementation — it does not depend on the original longmemeval package.

# Install dependencies
poetry install --extras "lme-benchmark"

# Initialize dataset submodule
git submodule update --init data/benchmarks/longmemeval

See longmemeval_benchmark/README.md for pipeline documentation, configuration, and preparation strategies.

Directory Structure

Directory	Description	Install Extras
`goodai_ltm_benchmark/`	GoodAI-LTM benchmark runner, baselines, hyperparameter sweeps, Streamlit UI	`goodai-benchmark`
`longmemeval_benchmark/`	LongMemEval evaluation framework (preparation + query pipeline)	`lme-benchmark`
`longmemeval_data_utils/`	Shared LongMemEval data loading and preparation utilities	`lme-benchmark`
`longmemeval_ablation_experiments/`	Ablation experiments: decayer temperature, hallucination analysis, error analysis	`goodai-benchmark`

Benchmark Datasets

Both benchmark datasets are managed as optional git submodules under data/benchmarks/. See data/benchmarks/README.md for dataset details, licenses, and setup instructions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Oblivion Experiments

Benchmarks

GoodAI-LTM Benchmark

LongMemEval Benchmark

Directory Structure

Benchmark Datasets

Uh oh!

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Oblivion Experiments

Benchmarks

GoodAI-LTM Benchmark

LongMemEval Benchmark

Directory Structure

Benchmark Datasets