mmirage.core.loader — Data Loaders¶
Base classes¶
Base classes and registry for data loaders in MMIRAGE.
- class mmirage.core.loader.base.BaseDataLoaderConfig(type, output_dir, image_base_path=None)[source]¶
Bases:
objectBase configuration class for data loaders.
All data loader configurations must inherit from this class and specify a type identifier.
- class mmirage.core.loader.base.BaseDataLoader[source]¶
-
Abstract base class for data loaders.
Data loaders are responsible for loading datasets from various sources (JSONL files, Hugging Face datasets, etc.) and returning them as Hugging Face Dataset objects.
- Type Parameters:
C: The configuration class type for this loader.
- from_config()[source]¶
Load a dataset from the given configuration.
- Parameters:
ds_config (C)
- Return type:
_FakeDataset | _FakeDatasetDict | None
- abstractmethod from_config(ds_config)[source]¶
Load a dataset from the given configuration.
- Parameters:
ds_config (C) – Configuration object for loading the dataset.
- Returns:
A Hugging Face Dataset or DatasetDict, or None if loading fails.
- Raises:
NotImplementedError – If not implemented by subclass.
- Return type:
_FakeDataset | _FakeDatasetDict | None
- class mmirage.core.loader.base.DataLoaderRegistry[source]¶
Bases:
objectRegistry for managing and accessing available data loaders.
Provides a centralized registry for data loader classes and their associated configuration classes, allowing dynamic loader instantiation based on type names.
- _registry¶
Mapping from loader name to registered loader class.
- _config_registry¶
Mapping from loader name to its configuration class.
- classmethod register(name, config_cls)[source]¶
Register a data loader class.
- Parameters:
name (str) – String identifier for the loader.
config_cls (Type[BaseDataLoaderConfig]) – Configuration class associated with this loader.
- Returns:
Decorator function to register the loader class.
- Return type:
- classmethod get_processor(name)[source]¶
Get a registered loader class by name.
- Parameters:
name (str) – String identifier of the loader.
- Returns:
The registered loader class.
- Raises:
ValueError – If no loader is registered under the given name.
- Return type:
- classmethod get_config_cls(name)[source]¶
Get a registered configuration class by loader name.
- Parameters:
name (str) – String identifier of the loader.
- Returns:
The registered configuration class.
- Raises:
ValueError – If no loader is registered under the given name.
- Return type:
- class mmirage.core.loader.base.AutoDataLoader[source]¶
Bases:
objectFactory class for instantiating data loaders by name.
- classmethod from_name(name)[source]¶
Retrieve a data loader class by its registered name.
- Parameters:
name (str) – The registry name of the data loader.
- Returns:
The registered data loader class.
- Raises:
ValueError – If no data loader is registered under the given name.
- Return type:
JSONL loader¶
JSONL data loader implementation.
- class mmirage.core.loader.jsonl.JSONLDataConfig(type, output_dir, image_base_path=None, path='')[source]¶
Bases:
BaseDataLoaderConfigConfiguration for loading JSONL datasets.
HuggingFace local loader¶
Local Hugging Face dataset loader implementation.
- class mmirage.core.loader.local_hf.LocalHFConfig(type, output_dir, image_base_path=None, path='')[source]¶
Bases:
BaseDataLoaderConfigConfiguration for loading local Hugging Face datasets.
Loader utilities¶
Utility functions for loading datasets and handling images.
- mmirage.core.loader.utils.load_datasets_from_configs(configs)[source]¶
Load multiple datasets from configurations.
Attempts to load datasets using the specified loader configurations. Failed loads are logged as warnings and skipped.
- Parameters:
configs (List[BaseDataLoaderConfig]) – List of dataset configuration objects.
- Returns:
List of Hugging Face Datasets/DatasetDicts.
- Raises:
RuntimeError – If no datasets could be loaded successfully.
- Return type:
List[_FakeDataset | _FakeDatasetDict]
- mmirage.core.loader.utils.resolve_image_input(value, image_base_path=None)[source]¶
Resolve image input to a format SGLang can use.
Handles multiple image input formats: - PIL Image objects: passed through directly - URLs (http/https): passed through as-is - Absolute file paths: validated and passed through - Relative file paths: resolved using image_base_path
- Parameters:
- Returns:
Resolved image value suitable for SGLang processing.
- Raises:
FileNotFoundError – If a relative path cannot be resolved.
RuntimeError – If an absolute path exists but is not a file.
- Return type:
PIL.Image.Image | str