RAG and LLM layer¶
The RAG layer starts after retrieval.
Its job is to turn retrieved sources into a grounded answer.
Relevant files¶
src/docask/rag/prompting.py
src/docask/rag/extractive_answerer.py
src/docask/rag/answering.py
src/docask/rag/llm_provider.py
src/docask/rag/llm_factory.py
src/docask/rag/qwen_provider.py
Prompting¶
File:
src/docask/rag/prompting.py
This module formats retrieved sources into a prompt.
The prompt instructs the LLM to:
answer only from the provided sources;
cite sources inline with
[Source 1],[Source 2], etc.;avoid inventing commands, paths, APIs, modules, or configuration keys;
avoid interpreting configuration values unless the sources explain them;
say when the sources are insufficient.
Debug command:
PYTHONPATH=src python scripts/debug_prompting.py \
"How do I configure indexing?" \
--backend simple \
--corpus-path data/projects/mmore/corpus.jsonl \
--config-path configs/app_config.yaml
LLM providers¶
DocAsk uses a provider interface:
src/docask/rag/llm_provider.py
The active provider is selected from:
configs/app_config.yaml
Example:
llm:
provider: qwen
model_name: Qwen/Qwen3-1.7B
max_new_tokens: 512
temperature: 0.0
enable_thinking: false
The Qwen provider uses Hugging Face Transformers.
High-level answering helpers¶
File:
src/docask/rag/answering.py
This module exposes:
prepare_answer_prompt(...)
answer_question(...)
answer_question_with_llm(...)
answer_question_with_provider(...)
Current flow:
question
→ project profile query expansion
→ retrieval
→ project profile filtering/reranking
→ optional project profile direct answer
→ prompt construction
→ LLM generation
Direct answers from project profiles¶
Some structured questions are better answered deterministically than by an LLM.
For example, the MMORE profile can answer Milvus parameter questions directly. This avoids returning unrelated fields such as model_name, top_k, or max_workers when the user asks specifically for Milvus parameters.
Temporary extractive answerer¶
File:
src/docask/rag/extractive_answerer.py
This remains available when LLM generation is disabled.
It:
takes the top retrieved source;
returns its content;
has a small special case for signature questions.
Command:
PYTHONPATH=src python scripts/answer_question.py \
"How do I configure indexing?" \
--backend simple
LLM answer generation¶
LLM generation can be enabled with:
PYTHONPATH=src python scripts/answer_question.py \
"How do I configure indexing?" \
--llm \
--backend simple \
--corpus-path data/projects/mmore/corpus.jsonl \
--config-path configs/app_config.yaml
The expected answer includes inline source citations.