Commands

This page lists the main commands used during development.

Run all commands from the root of the docask repository.

Run the Streamlit app

streamlit run app/streamlit_app.py

The app lets a user:

  • select a local target project;

  • build a project-specific corpus;

  • ask questions;

  • inspect retrieved sources;

  • switch retrieval backend;

  • enable or disable LLM generation.

Build the default corpus

PYTHONPATH=src python scripts/build_corpus.py

What it does:

  • reads configs/project_config.yaml;

  • loads Markdown and reStructuredText documentation;

  • extracts Python docstrings and signatures;

  • loads YAML configuration files if enabled;

  • generates a repository structure document if enabled;

  • writes data/processed/corpus.jsonl.

Build a project-specific corpus

PYTHONPATH=src python scripts/build_corpus.py \
  --config data/projects/mmore/project_config.yaml \
  --output-path data/projects/mmore/corpus.jsonl

This is the command used internally by the Streamlit project setup flow.

Preview a corpus

Default corpus:

PYTHONPATH=src python scripts/preview_corpus.py --limit 2

Project-specific corpus:

PYTHONPATH=src python scripts/preview_corpus.py \
  --corpus-path data/projects/mmore/corpus.jsonl \
  --limit 2

Useful filters:

PYTHONPATH=src python scripts/preview_corpus.py --source-type markdown_section --limit 3
PYTHONPATH=src python scripts/preview_corpus.py --source-type python_function --limit 3
PYTHONPATH=src python scripts/preview_corpus.py --source-type python_class --limit 3
PYTHONPATH=src python scripts/preview_corpus.py --source-type python_method --limit 3
PYTHONPATH=src python scripts/preview_corpus.py --source-type example_config --limit 3
PYTHONPATH=src python scripts/preview_corpus.py --source-type production_config --limit 3
PYTHONPATH=src python scripts/preview_corpus.py --source-type repo_structure --limit 1

Debug local retrieval

PYTHONPATH=src python scripts/debug_retrieval.py "How do I configure indexing?"

This directly tests the simple local retriever.

For a project-specific corpus:

PYTHONPATH=src python scripts/debug_retrieval.py \
  "How do I configure indexing?" \
  --corpus-path data/projects/mmore/corpus.jsonl

Debug prompt construction

PYTHONPATH=src python scripts/debug_prompting.py \
  "How do I configure indexing?" \
  --backend simple \
  --corpus-path data/projects/mmore/corpus.jsonl \
  --config-path configs/app_config.yaml

What it does:

  • retrieves sources;

  • applies the configured project profile;

  • builds the source-grounded prompt;

  • prints the prompt without calling an LLM.

Prepare an answer prompt

Simple backend:

PYTHONPATH=src python scripts/prepare_answer.py \
  "How do I configure indexing?" \
  --backend simple \
  --corpus-path data/projects/mmore/corpus.jsonl \
  --config-path configs/app_config.yaml

MMORE backend:

PYTHONPATH=src python scripts/prepare_answer.py \
  "How do I configure indexing?" \
  --backend mmore \
  --config-path configs/app_config.yaml

The MMORE backend requires an MMORE index to be built first.

Generate an answer

Simple backend with LLM:

PYTHONPATH=src python scripts/answer_question.py \
  "How do I configure indexing?" \
  --llm \
  --backend simple \
  --corpus-path data/projects/mmore/corpus.jsonl \
  --config-path configs/app_config.yaml

Simple backend without LLM:

PYTHONPATH=src python scripts/answer_question.py \
  "How do I configure indexing?" \
  --backend simple \
  --corpus-path data/projects/mmore/corpus.jsonl

Some project profiles can answer structured questions directly without loading the LLM.

Extract Python documentation only

PYTHONPATH=src python scripts/extract_code_docs.py

If the script is run dynamically:

PYTHONPATH=src python scripts/extract_code_docs.py \
  --config data/projects/mmore/project_config.yaml \
  --output-path data/projects/mmore/code_docs.jsonl

Export a corpus for MMORE

Default corpus:

PYTHONPATH=src python scripts/export_mmore_corpus.py

Project-specific corpus:

PYTHONPATH=src python scripts/export_mmore_corpus.py \
  --corpus-path data/projects/mmore/corpus.jsonl \
  --output-path data/projects/mmore/mmore_corpus.jsonl

Build the MMORE index

Default MMORE corpus:

PYTHONPATH=src python scripts/build_index.py

Project-specific MMORE corpus:

PYTHONPATH=src python scripts/build_index.py \
  --documents-path data/projects/mmore/mmore_corpus.jsonl \
  --collection-name mmore_docs

Run tests

PYTHONPATH=src pytest -q

The GitHub Actions workflow also runs these tests on push and pull request.