Retrieval¶
Retrieval is the step that finds relevant documents for a user question.
DocAsk currently supports two retrieval backends:
simple
mmore
Simple retriever¶
File:
src/docask/retrieval/simple_retriever.py
The simple retriever is a local debugging and dynamic-project backend.
It:
reads a selected
corpus.jsonl;tokenizes the query and documents;
ranks documents with overlap-based heuristics;
adds boosts for exact symbols, signatures, modules, titles, and user-facing documentation.
It does not use embeddings or MMORE.
Command:
PYTHONPATH=src python scripts/debug_retrieval.py \
"How do I configure indexing?" \
--corpus-path data/projects/mmore/corpus.jsonl
MMORE retriever¶
File:
src/docask/retrieval/mmore_retriever.py
The MMORE retriever:
loads a MMORE retriever from config;
calls
retriever.retrieve(...);parses DocAsk metadata from retrieved text;
converts raw MMORE results back into
RetrievalResultobjects.
Command example:
PYTHONPATH=src python scripts/prepare_answer.py \
"How do I configure indexing?" \
--backend mmore
Retriever factory¶
File:
src/docask/retrieval/retriever_factory.py
This file gives the rest of DocAsk one entry point:
retrieve_documents(...)
It chooses the backend based on:
simple
mmore
Project profiles¶
Retrieval results can be refined by a project profile.
Project profiles live in:
src/docask/project_profiles/
They can:
expand queries;
filter irrelevant sources;
rerank retrieved results;
answer some structured questions directly.
This keeps the core DocAsk retrieval pipeline generic while allowing MMORE-specific improvements to remain isolated.
Backend choice¶
For a project corpus built from the Streamlit interface, use:
backend simple
unless the corresponding MMORE index has also been rebuilt.
The mmore backend retrieves from the configured MMORE index, not directly from the selected JSONL corpus.