Repository structure¶

DocAsk uses a src/ layout.

docask/
├── README.md
├── pyproject.toml
├── .env.example
├── .gitignore
├── configs/
├── data/
├── docs/
├── scripts/
├── app/
├── src/
│   └── docask/
└── tests/

Top-level folders¶

Folder	Role
`app/`	Streamlit application entry point.
`configs/`	YAML configuration files for app settings, indexed project defaults, retrieval, and indexing.
`data/`	Local generated data: corpora, app state, extracted docs, and indexes.
`docs/`	Project documentation, optionally built with Sphinx/MyST.
`scripts/`	Command-line utilities for building, debugging, indexing, and running DocAsk.
`src/docask/`	Main Python package.
`tests/`	Automated tests.

Generated data layout¶

DocAsk can still use the default corpus path:

data/processed/corpus.jsonl

The Streamlit project setup flow creates project-specific data:

data/projects/
└── <project_name>/
    ├── project_config.yaml
    ├── corpus.jsonl
    └── mmore_corpus.jsonl       # optional, if exported for MMORE

The Streamlit app stores local UI state in:

data/app_state.json

This file is machine-specific and should normally not be committed.

Python package structure¶

src/docask/
├── config.py
├── data_models.py
├── loaders/
├── extractors/
├── corpus/
├── indexing/
├── retrieval/
├── rag/
├── project_profiles/
├── projects/
└── utils/

Component responsibilities¶

loaders/           -> load Markdown, YAML, and repository structure documents
extractors/        -> extract documentation from Python code
corpus/            -> merge all sources into corpus.jsonl
indexing/          -> convert and index the corpus with MMORE
retrieval/         -> retrieve relevant documents
rag/               -> build prompts and produce answers
project_profiles/ -> project-specific query expansion, filtering, reranking, and direct answers
projects/          -> project setup, generated configs, corpus paths, app state
utils/             -> shared paths and helpers

The goal is that a reader can understand the repository without opening every file.