Data flow

This page explains how information moves through DocAsk.

Step 1: project selection

DocAsk starts from a target project repository.

The Streamlit interface lets a user select a local project path. For example, the MMORE repository may be located at:

/path/to/mmore

DocAsk then creates a project-specific working folder:

data/projects/<project_name>/

For MMORE:

data/projects/mmore/

Step 2: project configuration

DocAsk generates or reads a project configuration file.

For a Streamlit-selected project, this is usually:

data/projects/<project_name>/project_config.yaml

For older command-line workflows, the default config remains:

configs/project_config.yaml

The project config points to:

docs_path          -> Markdown and RST documentation
code_path          -> Python source code
yaml_config_paths  -> YAML example or production configs
repo_path          -> repository structure

Step 3: normalized records

Each source is converted into a DocumentRecord.

A DocumentRecord contains:

  • a unique doc_id;

  • textual content;

  • a source_type;

  • optional source metadata such as path, title, module, symbol, and signature.

Example source types:

markdown_section
python_module
python_class
python_function
python_method
example_config
production_config
repo_structure

Step 4: corpus file

All records are saved to JSONL.

Default command-line corpus:

data/processed/corpus.jsonl

Project-specific corpus:

data/projects/<project_name>/corpus.jsonl

Each line is one serialized DocumentRecord.

Step 5: retrieval

There are two retrieval paths.

Simple retrieval

The simple retriever reads the selected corpus.jsonl directly.

It is useful for:

  • debugging corpus quality;

  • testing queries quickly;

  • using newly built project corpora;

  • avoiding MMORE indexing during development.

MMORE retrieval

For MMORE, the corpus is first converted to:

data/projects/<project_name>/mmore_corpus.jsonl

or, in the default workflow:

data/processed/mmore_corpus.jsonl

Then MMORE indexes it and retrieves from the generated index.

The MMORE backend does not automatically read a newly built corpus.jsonl. The index must be rebuilt when the target corpus changes.

Step 6: project profile

After retrieval, DocAsk can apply a project profile.

A project profile can:

  • expand queries with project-specific terms;

  • filter low-value or irrelevant results;

  • rerank results for known intents;

  • answer some structured questions directly without calling the LLM.

The MMORE profile, for example, can answer some Milvus parameter questions deterministically to avoid unrelated configuration fields.

Step 7: prompt and answer

If no direct answer is returned by the project profile, retrieved sources are formatted into a prompt.

The LLM is instructed to:

  • answer only from the provided sources;

  • cite every factual statement;

  • avoid inventing commands, paths, APIs, or configuration keys;

  • say clearly when the available sources are insufficient.

The final output is an answer with cited sources.