Loaders and extractors¶
DocAsk uses loaders and extractors to convert project sources into DocumentRecord objects.
Markdown loader¶
File:
src/docask/loaders/markdown_loader.py
Role:
reads
.mdand.rstfiles;splits Markdown files by headings;
creates one record per documentation section;
stores metadata such as relative path, page title, section title, and heading level.
Main source type:
markdown_section
Python documentation extractor¶
File:
src/docask/extractors/python_doc_extractor.py
Role:
parses Python files with the built-in
astmodule;extracts module docstrings;
extracts class docstrings;
extracts function and method docstrings;
reconstructs readable signatures.
Source types:
python_module
python_class
python_function
python_method
Current limitation:
it does not index full raw code;
it does not build a dependency graph;
it ignores symbols without docstrings.
YAML config loader¶
File:
src/docask/loaders/yaml_config_loader.py
Role:
scans configured folders for
.yamland.ymlfiles;converts config files into searchable documents;
adds hints for indexing, RAG, and retriever-related configs.
Source types:
example_config
production_config
yaml_config
Repository structure loader¶
File:
src/docask/loaders/repo_structure_loader.py
Role:
creates a synthetic tree view of the repository;
excludes noisy folders such as
.git,__pycache__,.venv,dist, andbuild;includes useful files such as
.py,.md,.rst,.yaml,.yml,.toml,.json, and.txt;helps answer navigation questions.
The maximum tree depth is controlled by:
repo_structure_max_depth: 6
in the project configuration.
Source type:
repo_structure