# Commands

This page lists the main commands used during development.

Run all commands from the root of the `GitHelp` repository.

Install the package in editable mode first:

```bash
python -m pip install -e .
```

## Run the Streamlit app

```bash
streamlit run app/streamlit_app.py
```

The app lets a user:

- select a local target project;
- clone a public GitHub repository into a local GitHelp-managed folder;
- build a project-specific corpus;
- ask questions;
- inspect retrieved sources;
- switch retrieval backend;
- enable or disable LLM generation.

## Build the default corpus

```bash
python scripts/build_corpus.py
```

What it does:

- reads `configs/project_config.yaml`;
- loads Markdown and reStructuredText documentation;
- extracts Python docstrings and signatures;
- loads YAML configuration files if enabled;
- generates a repository structure document if enabled;
- writes `data/processed/corpus.jsonl`.

## Build a project-specific corpus

```bash
python scripts/build_corpus.py \
  --config data/projects/mmore/project_config.yaml \
  --output-path data/projects/mmore/corpus.jsonl
```

This is the command used internally by the Streamlit project setup flow.

## Load a public GitHub repository

```bash
python scripts/load_github_repository.py \
  https://github.com/swiss-ai/mmore
```

This clones the repository into:

```text
data/repositories/swiss-ai-mmore/
```

The printed local path can then be used as the project path in Streamlit or in
project-specific corpus-building commands.

## Prepare a GitHub repository with the simple backend

```bash
python scripts/prepare_github_project.py \
  https://github.com/swiss-ai/mmore
```

This clones or reuses the repository, generates a project config, builds the
GitHelp JSONL corpus, and prepares the project for the `simple` retrieval
backend.

## Preview a corpus

Default corpus:

```bash
python scripts/preview_corpus.py --limit 2
```

Project-specific corpus:

```bash
python scripts/preview_corpus.py \
  --corpus-path data/projects/mmore/corpus.jsonl \
  --limit 2
```

Useful filters:

```bash
python scripts/preview_corpus.py --source-type markdown_section --limit 3
python scripts/preview_corpus.py --source-type python_function --limit 3
python scripts/preview_corpus.py --source-type python_class --limit 3
python scripts/preview_corpus.py --source-type python_method --limit 3
python scripts/preview_corpus.py --source-type example_config --limit 3
python scripts/preview_corpus.py --source-type production_config --limit 3
python scripts/preview_corpus.py --source-type repo_structure --limit 1
```

## Debug local retrieval

```bash
python scripts/debug_retrieval.py "How do I configure indexing?"
```

This directly tests the simple local retriever.

For a project-specific corpus:

```bash
python scripts/debug_retrieval.py \
  "How do I configure indexing?" \
  --corpus-path data/projects/mmore/corpus.jsonl
```

## Evaluate retrieval on a question set

```bash
python scripts/evaluate_retrieval.py \
  --questions-path githelp_eval_questions.txt \
  --corpus-path data/projects/mmore/corpus.jsonl \
  --backend simple \
  --top-k 5
```

This prints the top retrieved sources for each question without calling an LLM.
Use it to inspect whether retrieval is finding useful evidence before tuning
prompts or answer generation.

To include expected-source checks:

```bash
python scripts/evaluate_retrieval.py \
  --questions-path githelp_eval_questions.txt \
  --expected-sources-path githelp_eval_expected_sources.example.json \
  --corpus-path data/projects/mmore/corpus.jsonl \
  --backend simple \
  --top-k 5
```

## Debug prompt construction

```bash
python scripts/debug_prompting.py \
  "How do I configure indexing?" \
  --backend simple \
  --corpus-path data/projects/mmore/corpus.jsonl \
  --config-path configs/app_config.yaml
```

What it does:

- retrieves sources;
- applies the configured project profile;
- builds the source-grounded prompt;
- prints the prompt without calling an LLM.

## Prepare an answer prompt

Simple backend:

```bash
python scripts/prepare_answer.py \
  "How do I configure indexing?" \
  --backend simple \
  --corpus-path data/projects/mmore/corpus.jsonl \
  --config-path configs/app_config.yaml
```

MMORE backend:

```bash
python scripts/prepare_answer.py \
  "How do I configure indexing?" \
  --backend mmore \
  --config-path configs/app_config.yaml
```

The MMORE backend requires an MMORE index to be built first.

## Generate an answer

Simple backend with LLM:

```bash
python scripts/answer_question.py \
  "How do I configure indexing?" \
  --llm \
  --backend simple \
  --corpus-path data/projects/mmore/corpus.jsonl \
  --config-path configs/app_config.yaml
```

Simple backend without LLM:

```bash
python scripts/answer_question.py \
  "How do I configure indexing?" \
  --backend simple \
  --corpus-path data/projects/mmore/corpus.jsonl
```

Some project profiles can answer structured questions directly without loading the LLM.

## Extract Python documentation only

```bash
python scripts/extract_code_docs.py
```

If the script is run dynamically:

```bash
python scripts/extract_code_docs.py \
  --config data/projects/mmore/project_config.yaml \
  --output-path data/projects/mmore/code_docs.jsonl
```

## Export a corpus for MMORE

Default corpus:

```bash
python scripts/export_mmore_corpus.py
```

Project-specific corpus:

```bash
python scripts/export_mmore_corpus.py \
  --corpus-path data/projects/mmore/corpus.jsonl \
  --output-path data/projects/mmore/mmore_corpus.jsonl
```

## Build the MMORE index

Default MMORE corpus:

```bash
python scripts/build_index.py
```

Project-specific MMORE corpus:

```bash
python scripts/build_index.py \
  --documents-path data/projects/mmore/mmore_corpus.jsonl \
  --collection-name mmore_docs
```

## Run tests

```bash
pytest -q
```

The GitHub Actions workflow also runs these tests on push and pull request.