# RAG and LLM layer The RAG layer starts after retrieval. Its job is to turn retrieved sources into a grounded answer. ## Relevant files ```text src/docask/rag/prompting.py src/docask/rag/extractive_answerer.py src/docask/rag/answering.py src/docask/rag/llm_provider.py src/docask/rag/llm_factory.py src/docask/rag/qwen_provider.py ``` ## Prompting File: ```text src/docask/rag/prompting.py ``` This module formats retrieved sources into a prompt. The prompt instructs the LLM to: - answer only from the provided sources; - cite sources inline with `[Source 1]`, `[Source 2]`, etc.; - avoid inventing commands, paths, APIs, modules, or configuration keys; - avoid interpreting configuration values unless the sources explain them; - say when the sources are insufficient. Debug command: ```bash PYTHONPATH=src python scripts/debug_prompting.py \ "How do I configure indexing?" \ --backend simple \ --corpus-path data/projects/mmore/corpus.jsonl \ --config-path configs/app_config.yaml ``` ## LLM providers DocAsk uses a provider interface: ```text src/docask/rag/llm_provider.py ``` The active provider is selected from: ```text configs/app_config.yaml ``` Example: ```yaml llm: provider: qwen model_name: Qwen/Qwen3-1.7B max_new_tokens: 512 temperature: 0.0 enable_thinking: false ``` The Qwen provider uses Hugging Face Transformers. ## High-level answering helpers File: ```text src/docask/rag/answering.py ``` This module exposes: ```python prepare_answer_prompt(...) answer_question(...) answer_question_with_llm(...) answer_question_with_provider(...) ``` Current flow: ```text question → project profile query expansion → retrieval → project profile filtering/reranking → optional project profile direct answer → prompt construction → LLM generation ``` ## Direct answers from project profiles Some structured questions are better answered deterministically than by an LLM. For example, the MMORE profile can answer Milvus parameter questions directly. This avoids returning unrelated fields such as `model_name`, `top_k`, or `max_workers` when the user asks specifically for Milvus parameters. ## Temporary extractive answerer File: ```text src/docask/rag/extractive_answerer.py ``` This remains available when LLM generation is disabled. It: - takes the top retrieved source; - returns its content; - has a small special case for signature questions. Command: ```bash PYTHONPATH=src python scripts/answer_question.py \ "How do I configure indexing?" \ --backend simple ``` ## LLM answer generation LLM generation can be enabled with: ```bash PYTHONPATH=src python scripts/answer_question.py \ "How do I configure indexing?" \ --llm \ --backend simple \ --corpus-path data/projects/mmore/corpus.jsonl \ --config-path configs/app_config.yaml ``` The expected answer includes inline source citations.