LLM as a judgeΒΆ
Add a judge: block to your RAG config to check retrieval quality before generation. When chunks are not good enough, the judge can trigger a corrective action such as re-search, sub-questions, or web context and then merge everything into one deduplicated list.
How it worksΒΆ
Retrieve chunks from the index (Milvus + optional BGE rerank).
Evaluate them in a loop (at most
max_corrective_stepscorrective actions, default1):
PROCEEDwithout calling the judge LLM when index metrics meetmetric_thresholdsβ retrieval is already considered good enough.Otherwise, call
judge.llmand run the chosen corrective action.Repeat on the merged chunks until the judge says
PROCEEDor the step budget is exhausted.
Generate the answer from the final context.
Disallowed decisions are coerced to a fallback action (RE_RETRIEVE, ADD_QUESTIONS, or PROCEED). Invalid JSON defaults to PROCEED.
DecisionsΒΆ
Decision |
What it does |
|---|---|
|
Chunks are good enough; continue to the answer LLM |
|
Search the index again (reformulated query and/or more results) |
|
Up to 3 extra searches from sub-questions, then merge |
|
DuckDuckGo web snippets, then merge |
ConfigurationΒΆ
examples/rag/config_judge.yaml is a standalone config β it does not load on top of config.yaml.
python3 -m mmore rag --config-file examples/rag/config_judge.yaml
Or copy the judge: block into your own config.
Key settings under rag.judge:
metric_thresholdsβ index minimums (min_mean_similarity,min_max_rerank_score,min_num_docs, β¦)max_corrective_stepsβ how many corrective actions after the first retrievalallow_re_retrieve/allow_add_questions/allow_add_contextβ which corrective actions the judge may choose (see below)system_prompt/user_promptβ judge prompts; user prompt supports{query},{metrics},{chunks},{allowed_actions}, and correction-step placeholders
Using one corrective actionΒΆ
In your RAG config, under **rag.judge**, set allow_re_retrieve, allow_add_questions, and allow_add_context so only one corrective action is true (the others false). PROCEED is always available in {allowed_actions}.
When **metric_thresholds are met**, the pipeline **PROCEEDS immediately** without calling the judge LLM: index retrieval is already of high quality (similarity, rerank scores, enough documents).
When thresholds fail, the judge LLM is invoked. With a single corrective action enabled, it systematically chooses that action and fills the matching payload (extra_questions, web_query, or retrieve_params). Use a query suited to that action (multi-part question β ADD_QUESTIONS; missing corpus fact β ADD_CONTEXT; weak or mis-phrased retrieval β RE_RETRIEVE). Adjust system_prompt / user_prompt under rag.judge if needed.
Goal |
|
|---|---|
Sub-questions ( |
|
Web context ( |
|
Re-retrieval ( |
|
Examples: examples/rag/demo/config_add_questions.yaml, config_judge_add_questions.yaml.
For ADD_CONTEXT, install web search support:
pip install "mmore[rag,websearch]"