# Loaders and extractors DocAsk uses loaders and extractors to convert project sources into `DocumentRecord` objects. ## Markdown loader File: ```text src/docask/loaders/markdown_loader.py ``` Role: - reads `.md` and `.rst` files; - splits Markdown files by headings; - creates one record per documentation section; - stores metadata such as relative path, page title, section title, and heading level. Main source type: ```text markdown_section ``` ## Python documentation extractor File: ```text src/docask/extractors/python_doc_extractor.py ``` Role: - parses Python files with the built-in `ast` module; - extracts module docstrings; - extracts class docstrings; - extracts function and method docstrings; - reconstructs readable signatures. Source types: ```text python_module python_class python_function python_method ``` Current limitation: - it does not index full raw code; - it does not build a dependency graph; - it ignores symbols without docstrings. ## YAML config loader File: ```text src/docask/loaders/yaml_config_loader.py ``` Role: - scans configured folders for `.yaml` and `.yml` files; - converts config files into searchable documents; - adds hints for indexing, RAG, and retriever-related configs. Source types: ```text example_config production_config yaml_config ``` ## Repository structure loader File: ```text src/docask/loaders/repo_structure_loader.py ``` Role: - creates a synthetic tree view of the repository; - excludes noisy folders such as `.git`, `__pycache__`, `.venv`, `dist`, and `build`; - includes useful files such as `.py`, `.md`, `.rst`, `.yaml`, `.yml`, `.toml`, `.json`, and `.txt`; - helps answer navigation questions. The maximum tree depth is controlled by: ```yaml repo_structure_max_depth: 6 ``` in the project configuration. Source type: ```text repo_structure ```