# Debugging This page lists useful checks during development. ## Run all tests ```bash PYTHONPATH=src pytest -q ``` This is the main check before committing changes. ## Compile all Python files ```bash python -m compileall src scripts app ``` This catches syntax errors and some import issues. ## Test default corpus build ```bash PYTHONPATH=src python scripts/build_corpus.py ``` This should write: ```text data/processed/corpus.jsonl ``` ## Test dynamic corpus build ```bash PYTHONPATH=src python scripts/build_corpus.py \ --config configs/project_config.yaml \ --output-path data/projects/mmore/corpus.jsonl ``` This should write: ```text data/projects/mmore/corpus.jsonl ``` ## Test answering on a project-specific corpus ```bash PYTHONPATH=src python scripts/answer_question.py \ "Which Milvus parameters are used in the ColPali config?" \ --llm \ --backend simple \ --corpus-path data/projects/mmore/corpus.jsonl \ --config-path configs/app_config.yaml ``` For MMORE, some structured questions may be answered directly by the project profile without loading the LLM. ## Debug Streamlit state The app stores local state in: ```text data/app_state.json ``` If the interface restores an old project or wrong corpus path, remove this file: ```bash rm -f data/app_state.json ``` Then restart Streamlit. ## Clear generated project corpora ```bash rm -rf data/projects/ ``` Then rebuild the project corpus from the Streamlit interface or from the command line. ## Check for old imports after refactoring ```bash grep -R "docask.retrieval.answering\|docask.retrieval.prompting\|docask.retrieval.extractive_answerer\|docask.retrieval.mmore_format\|docask.retrieval.mmore_indexer" -n src scripts app docs ``` This should return nothing after moving files into `rag/` and `indexing/`. ## Remove generated cache files ```bash find . -type d -name "__pycache__" -prune -exec rm -rf {} + rm -rf src/docask.egg-info ``` ## Avoid grepping inside the virtual environment If the virtual environment is inside the repo, grep can return unrelated files from dependencies. Prefer: ```bash grep -R "test_prompting\|test_retrieval" -n README.md scripts src docs tests ``` instead of: ```bash grep -R "test_prompting\|test_retrieval" -n . ``` ## Debug source extraction Build the full corpus: ```bash PYTHONPATH=src python scripts/build_corpus.py ``` Preview specific source types: ```bash PYTHONPATH=src python scripts/preview_corpus.py --source-type python_function --limit 3 PYTHONPATH=src python scripts/preview_corpus.py --source-type example_config --limit 3 PYTHONPATH=src python scripts/preview_corpus.py --source-type repo_structure --limit 1 ``` ## Debug retrieval quality ```bash PYTHONPATH=src python scripts/debug_retrieval.py "How do I configure indexing?" ``` If results are poor, check: - whether the expected source exists in the corpus; - whether the source type is correct; - whether titles and metadata are informative; - whether the selected backend reads the expected corpus or index; - whether the query is too vague; - whether a project profile should expand, filter, or rerank that intent. ## Backend mismatch A common issue is using: ```text backend mmore ``` after building a new corpus with Streamlit. Building a corpus does not update the MMORE index. For a newly built project corpus, use: ```text backend simple ``` unless the MMORE export and index have also been rebuilt.