Skip to content
QueryGym Logo

QueryGym

A lightweight, reproducible toolkit for LLM-based query reformulation.

Publish to PyPI Build and Push Docker Images PyPI version PyPI - Downloads Python 3.9+ License: Apache 2.0

Features

  • Single Prompt Bank (YAML) with metadata
  • Simple DataLoader: Dependency-free file loading for queries, qrels, and contexts
  • Format Loaders: Optional BEIR and MS MARCO format loaders
  • OpenAI-compatible LLM client (works with any OpenAI API–compatible endpoint)
  • Pyserini optional: either pass contexts (JSONL) or pass a retriever instance to build contexts
  • Export-only: emits reformulated queries; optionally generates a bash script for Pyserini + trec_eval

Quick Example

import querygym as qg

# Load data
queries = qg.load_queries("queries.tsv")
qrels = qg.load_qrels("qrels.txt")

# Create reformulator
reformulator = qg.create_reformulator("genqr_ensemble", model="gpt-4")

# Reformulate
results = reformulator.reformulate_batch(queries)

# Save
qg.DataLoader.save_queries(
    [qg.QueryItem(r.qid, r.reformulated) for r in results],
    "reformulated.tsv"
)

Installation

Install from PyPI

pip install querygym

Use Docker (Quick Start)

# Pull pre-built image
docker pull ghcr.io/ls3-lab/querygym:latest

# Run with Docker Compose
docker compose run --rm querygym

See the Docker Guide for detailed setup and usage.

For optional features:

# With HuggingFace datasets support
pip install querygym[hf]

# With BEIR format support
pip install querygym[beir]

# With Pyserini adapter
pip install querygym[pyserini]

# All optional features
pip install querygym[all]

# Development dependencies
pip install querygym[dev]

Documentation

📊 Looking for benchmarks? Visit the Leaderboard.

Citation

If you use QueryGym in your research, please cite:

@misc{bigdeli2025querygymtoolkitreproduciblellmbased,
      title={QueryGym: A Toolkit for Reproducible LLM-Based Query Reformulation}, 
      author={Amin Bigdeli and Radin Hamidi Rad and Mert Incesu and Negar Arabzadeh and Charles L. A. Clarke and Ebrahim Bagheri},
      year={2025},
      eprint={2511.15996},
      archivePrefix={arXiv},
      primaryClass={cs.IR},
      url={https://arxiv.org/abs/2511.15996}, 
}

License

Apache License 2.0 - see LICENSE for details.