Data flow¶

This page explains how information moves through DocAsk.

Step 1: project selection¶

DocAsk starts from a target project repository.

The Streamlit interface lets a user select a local project path. For example, the MMORE repository may be located at:

/path/to/mmore

DocAsk then creates a project-specific working folder:

data/projects/<project_name>/

For MMORE:

data/projects/mmore/

Step 2: project configuration¶

DocAsk generates or reads a project configuration file.

For a Streamlit-selected project, this is usually:

data/projects/<project_name>/project_config.yaml

For older command-line workflows, the default config remains:

configs/project_config.yaml

The project config points to:

docs_path          -> Markdown and RST documentation
code_path          -> Python source code
yaml_config_paths  -> YAML example or production configs
repo_path          -> repository structure

Step 3: normalized records¶

Each source is converted into a DocumentRecord.

A DocumentRecord contains:

a unique doc_id;
textual content;
a source_type;
optional source metadata such as path, title, module, symbol, and signature.

Example source types:

markdown_section
python_module
python_class
python_function
python_method
example_config
production_config
repo_structure

Step 4: corpus file¶

All records are saved to JSONL.

Default command-line corpus:

data/processed/corpus.jsonl

Project-specific corpus:

data/projects/<project_name>/corpus.jsonl

Each line is one serialized DocumentRecord.

Step 5: retrieval¶

There are two retrieval paths.

Simple retrieval¶

The simple retriever reads the selected corpus.jsonl directly.

It is useful for:

debugging corpus quality;
testing queries quickly;
using newly built project corpora;
avoiding MMORE indexing during development.

MMORE retrieval¶

For MMORE, the corpus is first converted to:

data/projects/<project_name>/mmore_corpus.jsonl

or, in the default workflow:

data/processed/mmore_corpus.jsonl

Then MMORE indexes it and retrieves from the generated index.

The MMORE backend does not automatically read a newly built corpus.jsonl. The index must be rebuilt when the target corpus changes.

Step 6: project profile¶

After retrieval, DocAsk can apply a project profile.

A project profile can:

expand queries with project-specific terms;
filter low-value or irrelevant results;
rerank results for known intents;
answer some structured questions directly without calling the LLM.

The MMORE profile, for example, can answer some Milvus parameter questions deterministically to avoid unrelated configuration fields.

Step 7: prompt and answer¶

If no direct answer is returned by the project profile, retrieved sources are formatted into a prompt.

The LLM is instructed to:

answer only from the provided sources;
cite every factual statement;
avoid inventing commands, paths, APIs, or configuration keys;
say clearly when the available sources are insufficient.

The final output is an answer with cited sources.