Commands¶

This page lists the main commands used during development.

Run all commands from the root of the GitHelp repository.

Install the package in editable mode first:

python -m pip install -e .

Run the Streamlit app¶

streamlit run app/streamlit_app.py

The app lets a user:

select a local target project;
clone a public GitHub repository into a local GitHelp-managed folder;
build a project-specific corpus;
ask questions;
inspect retrieved sources;
switch retrieval backend;
enable or disable LLM generation.

Build the default corpus¶

python scripts/build_corpus.py

What it does:

reads configs/project_config.yaml;
loads Markdown and reStructuredText documentation;
extracts Python docstrings and signatures;
loads YAML configuration files if enabled;
generates a repository structure document if enabled;
writes data/processed/corpus.jsonl.

Build a project-specific corpus¶

python scripts/build_corpus.py \
  --config data/projects/mmore/project_config.yaml \
  --output-path data/projects/mmore/corpus.jsonl

This is the command used internally by the Streamlit project setup flow.

Load a public GitHub repository¶

python scripts/load_github_repository.py \
  https://github.com/swiss-ai/mmore

This clones the repository into:

data/repositories/swiss-ai-mmore/

The printed local path can then be used as the project path in Streamlit or in project-specific corpus-building commands.

Prepare a GitHub repository with the simple backend¶

python scripts/prepare_github_project.py \
  https://github.com/swiss-ai/mmore

This clones or reuses the repository, generates a project config, builds the GitHelp JSONL corpus, and prepares the project for the simple retrieval backend.

Preview a corpus¶

Default corpus:

python scripts/preview_corpus.py --limit 2

Project-specific corpus:

python scripts/preview_corpus.py \
  --corpus-path data/projects/mmore/corpus.jsonl \
  --limit 2

Useful filters:

python scripts/preview_corpus.py --source-type markdown_section --limit 3
python scripts/preview_corpus.py --source-type python_function --limit 3
python scripts/preview_corpus.py --source-type python_class --limit 3
python scripts/preview_corpus.py --source-type python_method --limit 3
python scripts/preview_corpus.py --source-type example_config --limit 3
python scripts/preview_corpus.py --source-type production_config --limit 3
python scripts/preview_corpus.py --source-type repo_structure --limit 1

Debug local retrieval¶

python scripts/debug_retrieval.py "How do I configure indexing?"

This directly tests the simple local retriever.

For a project-specific corpus:

python scripts/debug_retrieval.py \
  "How do I configure indexing?" \
  --corpus-path data/projects/mmore/corpus.jsonl

Evaluate retrieval on a question set¶

python scripts/evaluate_retrieval.py \
  --questions-path githelp_eval_questions.txt \
  --corpus-path data/projects/mmore/corpus.jsonl \
  --backend simple \
  --top-k 5

This prints the top retrieved sources for each question without calling an LLM. Use it to inspect whether retrieval is finding useful evidence before tuning prompts or answer generation.

To include expected-source checks:

python scripts/evaluate_retrieval.py \
  --questions-path githelp_eval_questions.txt \
  --expected-sources-path githelp_eval_expected_sources.example.json \
  --corpus-path data/projects/mmore/corpus.jsonl \
  --backend simple \
  --top-k 5

Debug prompt construction¶

python scripts/debug_prompting.py \
  "How do I configure indexing?" \
  --backend simple \
  --corpus-path data/projects/mmore/corpus.jsonl \
  --config-path configs/app_config.yaml

What it does:

retrieves sources;
applies the configured project profile;
builds the source-grounded prompt;
prints the prompt without calling an LLM.

Prepare an answer prompt¶

Simple backend:

python scripts/prepare_answer.py \
  "How do I configure indexing?" \
  --backend simple \
  --corpus-path data/projects/mmore/corpus.jsonl \
  --config-path configs/app_config.yaml

MMORE backend:

python scripts/prepare_answer.py \
  "How do I configure indexing?" \
  --backend mmore \
  --config-path configs/app_config.yaml

The MMORE backend requires an MMORE index to be built first.

Generate an answer¶

Simple backend with LLM:

python scripts/answer_question.py \
  "How do I configure indexing?" \
  --llm \
  --backend simple \
  --corpus-path data/projects/mmore/corpus.jsonl \
  --config-path configs/app_config.yaml

Simple backend without LLM:

python scripts/answer_question.py \
  "How do I configure indexing?" \
  --backend simple \
  --corpus-path data/projects/mmore/corpus.jsonl

Some project profiles can answer structured questions directly without loading the LLM.

Extract Python documentation only¶

python scripts/extract_code_docs.py

If the script is run dynamically:

python scripts/extract_code_docs.py \
  --config data/projects/mmore/project_config.yaml \
  --output-path data/projects/mmore/code_docs.jsonl

Export a corpus for MMORE¶

Default corpus:

python scripts/export_mmore_corpus.py

Project-specific corpus:

python scripts/export_mmore_corpus.py \
  --corpus-path data/projects/mmore/corpus.jsonl \
  --output-path data/projects/mmore/mmore_corpus.jsonl

Build the MMORE index¶

Default MMORE corpus:

python scripts/build_index.py

Project-specific MMORE corpus:

python scripts/build_index.py \
  --documents-path data/projects/mmore/mmore_corpus.jsonl \
  --collection-name mmore_docs

Run tests¶

pytest -q

The GitHub Actions workflow also runs these tests on push and pull request.