Repository structure¶
DocAsk uses a src/ layout.
docask/
├── README.md
├── pyproject.toml
├── .env.example
├── .gitignore
├── configs/
├── data/
├── docs/
├── scripts/
├── app/
├── src/
│ └── docask/
└── tests/
Top-level folders¶
Folder |
Role |
|---|---|
|
Streamlit application entry point. |
|
YAML configuration files for app settings, indexed project defaults, retrieval, and indexing. |
|
Local generated data: corpora, app state, extracted docs, and indexes. |
|
Project documentation, optionally built with Sphinx/MyST. |
|
Command-line utilities for building, debugging, indexing, and running DocAsk. |
|
Main Python package. |
|
Automated tests. |
Generated data layout¶
DocAsk can still use the default corpus path:
data/processed/corpus.jsonl
The Streamlit project setup flow creates project-specific data:
data/projects/
└── <project_name>/
├── project_config.yaml
├── corpus.jsonl
└── mmore_corpus.jsonl # optional, if exported for MMORE
The Streamlit app stores local UI state in:
data/app_state.json
This file is machine-specific and should normally not be committed.
Python package structure¶
src/docask/
├── config.py
├── data_models.py
├── loaders/
├── extractors/
├── corpus/
├── indexing/
├── retrieval/
├── rag/
├── project_profiles/
├── projects/
└── utils/
Component responsibilities¶
loaders/ -> load Markdown, YAML, and repository structure documents
extractors/ -> extract documentation from Python code
corpus/ -> merge all sources into corpus.jsonl
indexing/ -> convert and index the corpus with MMORE
retrieval/ -> retrieve relevant documents
rag/ -> build prompts and produce answers
project_profiles/ -> project-specific query expansion, filtering, reranking, and direct answers
projects/ -> project setup, generated configs, corpus paths, app state
utils/ -> shared paths and helpers
The goal is that a reader can understand the repository without opening every file.