Retrieval¶

Retrieval is the step that finds relevant documents for a user question.

DocAsk currently supports two retrieval backends:

simple
mmore

Shared result format¶

File:

src/docask/retrieval/base.py

The shared result type is:

RetrievalResult

It contains:

File:

src/docask/retrieval/simple_retriever.py

The simple retriever is a local debugging and dynamic-project backend.

It:

reads a selected corpus.jsonl;
tokenizes the query and documents;
ranks documents with overlap-based heuristics;
adds boosts for exact symbols, signatures, modules, titles, and user-facing documentation.

It does not use embeddings or MMORE.

Command:

PYTHONPATH=src python scripts/debug_retrieval.py \
  "How do I configure indexing?" \
  --corpus-path data/projects/mmore/corpus.jsonl

File:

src/docask/retrieval/mmore_retriever.py

The MMORE retriever:

Command example:

PYTHONPATH=src python scripts/prepare_answer.py \
  "How do I configure indexing?" \
  --backend mmore

File:

src/docask/retrieval/retriever_factory.py

This file gives the rest of DocAsk one entry point:

retrieve_documents(...)

It chooses the backend based on:

simple
mmore

Retrieval results can be refined by a project profile.

Project profiles live in:

src/docask/project_profiles/

They can:

This keeps the core DocAsk retrieval pipeline generic while allowing MMORE-specific improvements to remain isolated.

For a project corpus built from the Streamlit interface, use:

backend simple

unless the corresponding MMORE index has also been rebuilt.

The mmore backend retrieves from the configured MMORE index, not directly from the selected JSONL corpus.