Data flow¶
This page explains how information moves through DocAsk.
Step 1: project selection¶
DocAsk starts from a target project repository.
The Streamlit interface lets a user select a local project path. For example, the MMORE repository may be located at:
/path/to/mmore
DocAsk then creates a project-specific working folder:
data/projects/<project_name>/
For MMORE:
data/projects/mmore/
Step 2: project configuration¶
DocAsk generates or reads a project configuration file.
For a Streamlit-selected project, this is usually:
data/projects/<project_name>/project_config.yaml
For older command-line workflows, the default config remains:
configs/project_config.yaml
The project config points to:
docs_path -> Markdown and RST documentation
code_path -> Python source code
yaml_config_paths -> YAML example or production configs
repo_path -> repository structure
Step 3: normalized records¶
Each source is converted into a DocumentRecord.
A DocumentRecord contains:
a unique
doc_id;textual
content;a
source_type;optional source metadata such as path, title, module, symbol, and signature.
Example source types:
markdown_section
python_module
python_class
python_function
python_method
example_config
production_config
repo_structure
Step 4: corpus file¶
All records are saved to JSONL.
Default command-line corpus:
data/processed/corpus.jsonl
Project-specific corpus:
data/projects/<project_name>/corpus.jsonl
Each line is one serialized DocumentRecord.
Step 5: retrieval¶
There are two retrieval paths.
Simple retrieval¶
The simple retriever reads the selected corpus.jsonl directly.
It is useful for:
debugging corpus quality;
testing queries quickly;
using newly built project corpora;
avoiding MMORE indexing during development.
MMORE retrieval¶
For MMORE, the corpus is first converted to:
data/projects/<project_name>/mmore_corpus.jsonl
or, in the default workflow:
data/processed/mmore_corpus.jsonl
Then MMORE indexes it and retrieves from the generated index.
The MMORE backend does not automatically read a newly built corpus.jsonl. The index must be rebuilt when the target corpus changes.
Step 6: project profile¶
After retrieval, DocAsk can apply a project profile.
A project profile can:
expand queries with project-specific terms;
filter low-value or irrelevant results;
rerank results for known intents;
answer some structured questions directly without calling the LLM.
The MMORE profile, for example, can answer some Milvus parameter questions deterministically to avoid unrelated configuration fields.
Step 7: prompt and answer¶
If no direct answer is returned by the project profile, retrieved sources are formatted into a prompt.
The LLM is instructed to:
answer only from the provided sources;
cite every factual statement;
avoid inventing commands, paths, APIs, or configuration keys;
say clearly when the available sources are insufficient.
The final output is an answer with cited sources.