mmirage.core.process — Processors¶
Variables¶
Variable system for MMIRAGE pipeline with multimodal support.
- class mmirage.core.process.variables.BaseVar(name='')[source]¶
Bases:
ABCBase class for variables in the MMIRAGE pipeline.
- Parameters:
name (str)
- class mmirage.core.process.variables.InputVar(name='', key='', type='text')[source]¶
Bases:
BaseVarInput variable extracted from source datasets.
- type¶
Variable type - “text” or “image”.
- Type:
Literal[‘text’, ‘image’]
- class mmirage.core.process.variables.OutputVar(name='', type='')[source]¶
Bases:
BaseVarOutput variable generated by processors.
Output variables are created by processors (e.g., LLMs) and can depend on input variables and previously computed output variables.
- class mmirage.core.process.variables.VariableEnvironment(var_env, image_vars=None)[source]¶
Bases:
objectEnvironment for storing and accessing variables during processing.
- with_variable(key, value, is_image=False)[source]¶
Create a new environment with an additional variable.
- Parameters:
- Returns:
New environment with the added variable.
- Return type:
- to_dict()[source]¶
Get an immutable view of the variable dictionary.
- Returns:
MappingProxyType providing read-only access to variables.
- Return type:
- get_image_vars()[source]¶
Get all image variable names.
- Returns:
Copy of the set containing names of all image variables.
- Return type:
- has_images()[source]¶
Check if the environment contains any image variables.
- Returns:
True if at least one image variable is present, False otherwise.
- Return type:
- static from_input_variables(sample, input_vars, image_base_path=None)[source]¶
Create a variable environment from a single sample.
- Parameters:
- Returns:
Environment populated with extracted variables.
- Return type:
- Raises:
ValueError – If a required input variable is not found in the sample.
Base processor¶
Base classes and registry for processors in MMIRAGE.
- class mmirage.core.process.base.BaseProcessorConfig(type='')[source]¶
Bases:
objectBase configuration class for processors.
All processor configurations must inherit from this class.
- Parameters:
type (str)
- class mmirage.core.process.base.BaseProcessor(config)[source]¶
-
Abstract base class for data processors.
Processors are responsible for transforming data by generating new output variables from existing variables.
- Type Parameters:
C: The output variable type this processor works with.
- Parameters:
config (BaseProcessorConfig)
- config¶
Configuration object for this processor.
- __init__(config)[source]¶
Initialize the processor with configuration.
- Parameters:
config (BaseProcessorConfig) – Configuration object for this processor.
- Return type:
None
- abstractmethod batch_process_sample(batch, output_var)[source]¶
Process a batch of variable environments.
- Parameters:
batch (List[VariableEnvironment]) – List of variable environments to process.
output_var (C) – Output variable definition to generate.
- Returns:
List of updated variable environments with the new output variable.
- Raises:
NotImplementedError – If not implemented by subclass.
- Return type:
- class mmirage.core.process.base.ProcessorRegistry[source]¶
Bases:
objectRegistry for managing and accessing available processors.
Provides a centralized registry for processor classes, their configuration classes, and their output variable classes.
- _registry¶
Mapping from processor name to registered processor class.
- _config_registry¶
Mapping from processor name to its configuration class.
- _output_var_registry¶
Mapping from processor name to its output variable class.
- classmethod register_types(name, config_cls, output_var_cls)[source]¶
Register config/output-var types without importing processor implementations.
- Parameters:
name (str)
config_cls (Type[BaseProcessorConfig])
- Return type:
None
- classmethod register(name, config_cls, output_var_cls)[source]¶
Register a processor class with its associated classes.
- Parameters:
name (str) – String identifier for the processor.
config_cls (Type[BaseProcessorConfig]) – Configuration class associated with this processor.
output_var_cls (Type[OutputVar]) – Output variable class associated with this processor.
- Returns:
Decorator function to register the processor class.
- Return type:
- classmethod get_processor(name)[source]¶
Get a registered processor class by name.
- Parameters:
name (str) – String identifier of the processor.
- Returns:
The registered processor class.
- Raises:
ValueError – If no processor is registered under the given name.
- Return type:
- classmethod get_config_cls(name)[source]¶
Get a registered configuration class by processor name.
- Parameters:
name (str) – String identifier of the processor.
- Returns:
The registered configuration class.
- Raises:
ValueError – If no processor is registered under the given name.
- Return type:
- classmethod get_output_var_cls(name)[source]¶
Get a registered output variable class by processor name.
- Parameters:
name (str) – String identifier of the processor.
- Returns:
The registered output variable class.
- Raises:
ValueError – If no processor is registered under the given name.
- Return type:
- class mmirage.core.process.base.AutoProcessor[source]¶
Bases:
objectFactory class for instantiating processors by name.
- classmethod from_name(name)[source]¶
Retrieve a processor class by its registered name.
- Parameters:
name (str) – The registry name of the processor.
- Returns:
The registered processor class.
- Raises:
ValueError – If no processor is registered under the given name.
- Return type:
Mapper¶
Mapper for orchestrating variable transformations.
- class mmirage.core.process.mapper.MMIRAGEMapper(processor_configs, input_vars, output_vars)[source]¶
Bases:
objectMapper for orchestrating variable transformations in the MMIRAGE pipeline.
Manages processors, validates variable dependencies, and applies transformations to batches of data. Supports multimodal inputs.
- Parameters:
- processors¶
Dictionary mapping processor types to processor instances.
- output_vars¶
List of output variables to generate.
- input_vars¶
List of input variables to extract.
- validate_vars()[source]¶
Validate that all output variables are computable.
Checks that each output variable can be computed given the available variables (inputs and previously computed outputs).
- Returns:
True if all variables are computable, False otherwise.
- Return type:
- rewrite_batch(batch, image_base_path=None)[source]¶
Transform a batch of samples by computing output variables.
- Parameters:
- Returns:
List of VariableEnvironments with all output variables computed.
- Raises:
RuntimeError – If an output variable type has no registered processor.
- Return type:
LLM processor¶
Configuration¶
Configuration for LLM processor in MMIRAGE.
- class mmirage.core.process.processors.llm.config.SGLangServerArgs(model_path='none', tp_size=<factory>, trust_remote_code=True, disable_custom_all_reduce=False)[source]¶
Bases:
objectServer arguments for SGLang engine.
- Parameters:
- class mmirage.core.process.processors.llm.config.SGLangLLMConfig(type='', server_args=<factory>, default_sampling_params=<factory>, chat_template='')[source]¶
Bases:
BaseProcessorConfigConfiguration for LLM processor using SGLang.
Supports both text-only and multimodal (vision-language) models.
- Parameters:
- server_args¶
SGLang server arguments including model path and TP size.
- server_args: SGLangServerArgs¶
- class mmirage.core.process.processors.llm.config.LLMOutputVar(name='', type='', prompt='', output_schema=<factory>, output_type='')[source]¶
Bases:
OutputVarOutput variable generated by LLM processor.
Uses Jinja2 templating for prompts and supports both plain text and structured JSON outputs.
Implementation¶
LLM processor implementation using SGLang with multimodal support.