Debugging¶
This page lists useful checks during development.
Run all tests¶
PYTHONPATH=src pytest -q
This is the main check before committing changes.
Compile all Python files¶
python -m compileall src scripts app
This catches syntax errors and some import issues.
Test default corpus build¶
PYTHONPATH=src python scripts/build_corpus.py
This should write:
data/processed/corpus.jsonl
Test dynamic corpus build¶
PYTHONPATH=src python scripts/build_corpus.py \
--config configs/project_config.yaml \
--output-path data/projects/mmore/corpus.jsonl
This should write:
data/projects/mmore/corpus.jsonl
Test answering on a project-specific corpus¶
PYTHONPATH=src python scripts/answer_question.py \
"Which Milvus parameters are used in the ColPali config?" \
--llm \
--backend simple \
--corpus-path data/projects/mmore/corpus.jsonl \
--config-path configs/app_config.yaml
For MMORE, some structured questions may be answered directly by the project profile without loading the LLM.
Debug Streamlit state¶
The app stores local state in:
data/app_state.json
If the interface restores an old project or wrong corpus path, remove this file:
rm -f data/app_state.json
Then restart Streamlit.
Clear generated project corpora¶
rm -rf data/projects/
Then rebuild the project corpus from the Streamlit interface or from the command line.
Check for old imports after refactoring¶
grep -R "docask.retrieval.answering\|docask.retrieval.prompting\|docask.retrieval.extractive_answerer\|docask.retrieval.mmore_format\|docask.retrieval.mmore_indexer" -n src scripts app docs
This should return nothing after moving files into rag/ and indexing/.
Remove generated cache files¶
find . -type d -name "__pycache__" -prune -exec rm -rf {} +
rm -rf src/docask.egg-info
Avoid grepping inside the virtual environment¶
If the virtual environment is inside the repo, grep can return unrelated files from dependencies.
Prefer:
grep -R "test_prompting\|test_retrieval" -n README.md scripts src docs tests
instead of:
grep -R "test_prompting\|test_retrieval" -n .
Debug source extraction¶
Build the full corpus:
PYTHONPATH=src python scripts/build_corpus.py
Preview specific source types:
PYTHONPATH=src python scripts/preview_corpus.py --source-type python_function --limit 3
PYTHONPATH=src python scripts/preview_corpus.py --source-type example_config --limit 3
PYTHONPATH=src python scripts/preview_corpus.py --source-type repo_structure --limit 1
Debug retrieval quality¶
PYTHONPATH=src python scripts/debug_retrieval.py "How do I configure indexing?"
If results are poor, check:
whether the expected source exists in the corpus;
whether the source type is correct;
whether titles and metadata are informative;
whether the selected backend reads the expected corpus or index;
whether the query is too vague;
whether a project profile should expand, filter, or rerank that intent.
Backend mismatch¶
A common issue is using:
backend mmore
after building a new corpus with Streamlit.
Building a corpus does not update the MMORE index. For a newly built project corpus, use:
backend simple
unless the MMORE export and index have also been rebuilt.