# Indexing GitHelp has its own corpus format, but MMORE expects a different JSONL format. The indexing layer bridges the two. ## Relevant files ```text src/githelp/indexing/mmore_format.py src/githelp/indexing/mmore_indexer.py scripts/export_mmore_corpus.py scripts/build_index.py ``` ## Step 1: export to MMORE format Default command: ```bash python scripts/export_mmore_corpus.py ``` Default input: ```text data/processed/corpus.jsonl ``` Default output: ```text data/processed/mmore_corpus.jsonl ``` Project-specific command: ```bash python scripts/export_mmore_corpus.py \ --corpus-path data/projects/mmore/corpus.jsonl \ --output-path data/projects/mmore/mmore_corpus.jsonl ``` MMORE-compatible records look like: ```json { "text": "...", "modalities": [], "metadata": {} } ``` GitHelp adds a short source header inside the text field before indexing. This makes it possible to reconstruct source information after MMORE retrieval. ## Step 2: build the MMORE index Default command: ```bash python scripts/build_index.py ``` Project-specific command: ```bash python scripts/build_index.py \ --documents-path data/projects/mmore/mmore_corpus.jsonl \ --collection-name mmore_docs ``` This uses: ```text configs/mmore_index_config.yaml ``` and stores the index under: ```text data/indexes/mmore/ ``` If answering with backend `mmore` fails with `KeyError: 'model_name'`, restart Streamlit after rebuilding the index. GitHelp can recover from missing Milvus model metadata by reading model names from `configs/mmore_index_config.yaml`, but an already-running Streamlit process may still be using older adapter code. In local environments where native MMORE/Milvus retrieval crashes, GitHelp runs native retrieval in an isolated child process. If that process fails, the `mmore` backend falls back to the exported `mmore_corpus.jsonl` so Streamlit can still answer from the MMORE-formatted corpus. ## Why keep indexing separate? The corpus can be built and inspected before MMORE is involved. This makes debugging easier: 1. build `corpus.jsonl`; 2. preview the records; 3. test simple retrieval; 4. only then export and index with MMORE. ## Important distinction Building a corpus does not automatically rebuild the MMORE index. For a newly selected project in Streamlit: ```text Build corpus → backend simple ``` For MMORE retrieval: ```text Build corpus → export MMORE corpus → build MMORE index → backend mmore ```