Combining Query Performance Predictors: A Reproducibility Study.

This repository contains the codebase of Combining Query Performance Predictors: A Reproducibility Study (published in ECIR 2025)

Pre-Retrieval Methods

As the original pre-retrieval QPP methods were not provided by Hauff et al. (ECIR 2009), we have implemented them here, specifically AvgIDF, MaxIDF, SumSCQ, AvgSCQ, MaxSCQ, SumVAR, AvgVAR, MaxVAR, AvP, AvNP. The implementations are avaialble in the PreRetQPP directory.

MaxIDF / AvgIDF
- Pre-retrieval IDF-based predictors
SumSCQ / AvSCQ / MaxSCQ
- SCQ-based QPP Predictors
Ambiguity-based (AvP, AvNP)
- Ambiguity and Similarity-based QPP predictors
Ranking Sensitivity-based (SumVAR, AvVAR, MaxVAR)
- Ranking Sensitivity-based Predictors

Post-Retrieval Methods

For the post-retrieval QPP methods, we utilized code provided by the original authors when available. For methods lacking a readily available codebase, we implemented them independently. Below are the links to all 10 methods used in this project:

WIG (Weighted Information Gain) (unofficial implementation)
- Link
NQC (Normalized Query Commitment) (unofficial implementation)
- Link
Clarity (unofficial implementation)
- Link
UEF (Uncertain Estimation Fusion) (unofficial implementation)
- Link
NeuralQPP
- available in the NeuralQPP directory
Deep-QPP (from the author's repository)
- Link
qppBERT-PL (from the author's repository)
- Link
BERT-QPP (from the author's repository)
- Link

These repositories serve as resources for implementations of various QPP methods, some of which may require adaptation to integrate into the project’s specific framework.

QPP Scores

Pre-computed QPP, AP measures can be found in data folder.

Combining different QPP methods

To run different (penalized) regression methods and sampling strategies on three collections — TREC Robust, TREC DL 2019 & 2020, and ClueWeb09B — use the following instructions.

Example command

To run the leave one out based approaches used by Hauff et al. :

python3 leave-one-out.py --k 1000 --input data --qpp_type pre --dataset trec678 --ols_type ols

To run lars-traps with leave one out :

python3 lars-traps.py --k 1000 --input data --qpp_type pre --dataset trec678

To run bolasso with leave one out:

python3 bolasso.py --k 1000 --input data --qpp_type pre --dataset trec678

To run lars-traps with half-split :

python3 lars-traps-split-half.py --k 1000 --input data --qpp_type pre --dataset trec678rb

To run bolasso with half-split:

python3 bolasso-split-half.py --k 1000 --input data --qpp_type post --dataset trec678rb

To compute smare:

python3 smare <path_to_csv_file>

Compute correlation metrics:

python3 compute-correlation.py <path_csv_file> <retrieval_depth (1000 / 100)>

Fit linear regression with indv. predictor (half-split):

python3 indv-predictor-regr-half-split.py --k 1000 --input data --qpp_type pre --dataset trec678rb --ols_type ols

To run multiple regression with half-split:

python3 multiple-regression-half-split.py --k 1000 --input data --qpp_type pre --dataset trec678rb --ols_type ols

Citation

@InProceedings{10.1007/978-3-031-88717-8_9,
   author="Saha, Sourav
   and Datta, Suchana
   and Roy, Dwaipayan
   and Mitra, Mandar
   and Greene, Derek",
   title="Combining Query Performance Predictors: A Reproducibility Study",
   booktitle="Advances in Information Retrieval",
   year="2025",
   publisher="Springer Nature Switzerland",
   address="Cham",
   pages="112--129",
   isbn="978-3-031-88717-8"
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Combining Query Performance Predictors: A Reproducibility Study.

Pre-Retrieval Methods

Post-Retrieval Methods

QPP Scores

Combining different QPP methods

Example command

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
NeuralQPP		NeuralQPP
PreRetQPP		PreRetQPP
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
bolasso-split-half.py		bolasso-split-half.py
bolasso.py		bolasso.py
compute-correlation.py		compute-correlation.py
confidence-interval-for-rho.py		confidence-interval-for-rho.py
indv-predictor-regr-half-split.py		indv-predictor-regr-half-split.py
lars-traps-split-half.py		lars-traps-split-half.py
lars-traps.py		lars-traps.py
leave-one-out.py		leave-one-out.py
multiple-regression-half-split.py		multiple-regression-half-split.py
random-predictors.v2.py		random-predictors.v2.py
smare.py		smare.py

Folders and files

Latest commit

History

Repository files navigation

Combining Query Performance Predictors: A Reproducibility Study.

Pre-Retrieval Methods

Post-Retrieval Methods

QPP Scores

Combining different QPP methods

Example command

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages