multimeditron.model.modalities package¶

Submodules¶

multimeditron.model.modalities.base module¶

multimeditron.model.modalities.image_modality module¶

class multimeditron.model.modalities.image_modality.ImageConfig(hidden_size: int = 4096, clip_name: str = 'openai/clip-vit-large-patch14', projection_type: str = 'mlp', use_2d_position_ids: bool = False, **kwargs)¶

Bases: BaseModalityConfig

Configuration class for the Image Modality. Extends the BaseModalityConfig.

hidden_size¶

Dimension of the hidden layer for the projection network.

Type:: int

clip_name¶

Name of the CLIP model to use as the feature extractor.

Type:: str

projection_type¶

Type of projection network (e.g., “mlp”).

Type:: str

use_2d_position_ids¶

Whether to use the 2D positional embeddings adaptation for 1D llm without retraining.

Type:: bool

Example

>>> config = ImageConfig(hidden_size=512, clip_name="openai/clip-vit-base-patch32")
>>> print(config.clip_name)
openai/clip-vit-base-patch32

model_type: str = 'meditron_clip'¶

class multimeditron.model.modalities.image_modality.ImageModality(config: ImageConfig)¶

Bases: BaseModality

config_class¶: alias of ImageConfig

forward(inputs) → FloatTensor¶

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

freeze_modality_embedder()¶: Freeze the parameters of the modality

Danger

This function should an will keep the modality in “eval” mode even if you call torch.nn.Module.train() on the model! To remove the eval mode on the modality you should call multimeditron.model.modalities.base.BaseModality.unfreeze_modality_embedder()

preprocessor_class¶: alias of ImageProcessor

unfreeze_modality_embedder()¶: Unfreeze the parameters of the modality

unfreeze_projection()¶: Unfreeze the parameters of the projection

class multimeditron.model.modalities.image_modality.ImageProcessor(config)¶

Bases: BaseModalityProcessor

A processor for handling image data. It uses a pretrained CLIP model for processing image inputs into tensors.

image_processor¶

An instance of a pretrained image processor.

Type:: AutoImageProcessor

_num_patches_per_entry¶

The number of patches per image entry, based on image and patch size.

Type:: int

process(modality: Dict[str, Any]) → Dict[str, Any]¶

Processes the input image modality into a tensor suitable for model consumption.

Parameters:: modality (Dict[str, Any]) – The input image data, where “value” is the key for image data.
Returns:: The processed tensor representation of the image.
Return type:: torch.Tensor

multimeditron.model.modalities.image_modality_moe module¶

class multimeditron.model.modalities.image_modality_moe.MOEImageConfig(hidden_size: int = 1024, use_bias_proj: bool = True, expert_clip_names: List[str] = [], image_processor: str = 'openai/clip-vit-large-patch14', gating_path: str = '', top_k_experts: int = 1, projection_type: str = 'mlp', generalist_idx: int = -1, fusion_method: str = 'weighted_average', cross_attn_heads: int = 8, **kwargs)¶

Bases: BaseModalityConfig

model_type: str = 'moe_meditron_clip'¶

class multimeditron.model.modalities.image_modality_moe.MOEImageModality(config: MOEImageConfig)¶

Bases: BaseModality

Mixture of Experts (MoE) Image Modality using CLIP models as experts. Combines multiple pretrained CLIP models as experts and uses a gating network to select and weight their outputs. During training, all experts are used and their outputs are weighted by the gating network. During evaluation, only the top-k experts are used (not implemented yet).

config_class¶: alias of MOEImageConfig

forward(inputs) → Tensor¶

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

freeze_modality_embedder()¶: Freeze the parameters of the modality

Danger

This function should an will keep the modality in “eval” mode even if you call torch.nn.Module.train() on the model! To remove the eval mode on the modality you should call multimeditron.model.modalities.base.BaseModality.unfreeze_modality_embedder()

preprocessor_class¶: alias of MOEImageProcessor

train(mode: bool = True)¶

Set the module in training mode.

This has an effect only on certain modules. See the documentation of particular modules for details of their behaviors in training/evaluation mode, i.e., whether they are affected, e.g. Dropout, BatchNorm, etc.

Parameters:: mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.
Returns:: self
Return type:: Module

unfreeze_modality_embedder()¶: Unfreeze the parameters of the modality

unfreeze_projection()¶: Unfreeze the parameters of the projection

class multimeditron.model.modalities.image_modality_moe.MOEImageProcessor(config: MOEImageConfig)¶

Bases: BaseModalityProcessor

Processor for Mixture of Experts (MoE) Image Modality. Uses a pretrained image processor to convert raw images into pixel values. Prepares processing for the fusion method between experts’ outputs.

process(modality: Dict[str, Any])¶

Abstract method for processing modality.

Parameters:: modality (Dict[str, Any]) – Input data to be processed.
Returns:: The original sample with the processed modality
Return type:: Dict[str, Any]

Module contents¶

class multimeditron.model.modalities.AutoModality¶

Bases: object

A class for managing modality registration and retrieval.

The AutoModality class provides a centralized registry for modality subclasses. It handles the registration of modality classes, and allows users to retrieve pretrained models, processors, and configurations for specific modalities.

_registry¶

Internal dictionary storing registered modality classes, indexed by name.

Type:: dict

classmethod config_from_dict(config: dict, **kwargs) → BaseModalityConfig¶

classmethod from_pretrained(*args, **kwargs) → BaseModality¶

classmethod preprocessor_from_name(name: str, *args, **kwargs) → BaseModalityProcessor¶

classmethod register(name: str)¶

class multimeditron.model.modalities.BaseModality(config: BaseModalityConfig, dtype: dtype = torch.bfloat16)¶

Bases: ABC, PreTrainedModel

Abstract base class for modality models.

This base class defines the common interface and attributes for all modality models. Subclasses must implement the abstract methods embedding_size, freeze_modality_embedder, unfreeze_modality_embedder and unfreeze_projection

config¶

Configuration object for the modality.

Type:: BaseModalityConfig

config_class¶

Class reference for the configuration.

Type:: type

tokenizer¶

Tokenizer associated with the model, if any.

Type:: Optional[Any]

_dtype¶

Data type for the model’s tensors.

Type:: torch.dtype

freeze_all()¶: Freeze all parameters in the model.

abstractmethod freeze_modality_embedder()¶: Freeze the parameters of the modality

Danger

This function should an will keep the modality in “eval” mode even if you call torch.nn.Module.train() on the model! To remove the eval mode on the modality you should call multimeditron.model.modalities.base.BaseModality.unfreeze_modality_embedder()

get_config() → BaseModalityConfig¶

Retrieve the configuration object associated with the modality.

Returns:: The configuration object.
Return type:: ModalityConfig

preprocessor_class: type = None¶

unfreeze_all()¶: Unfreeze all parameters in the model.

abstractmethod unfreeze_modality_embedder()¶: Unfreeze the parameters of the modality

abstractmethod unfreeze_projection()¶: Unfreeze the parameters of the projection

class multimeditron.model.modalities.BaseModalityConfig(hidden_size: int = 1024, modality_type: str | None = None, **kwargs)¶

Bases: PretrainedConfig

Configuration class for defining modality parameters.

This configuration is used as the base for all modality-specific configurations.

hidden_size¶

The size of the hidden layers’ representation.

Type:: int

modality_type¶

The type of modality (e.g., ‘ClipImage’, ‘ClipAudio’).

Type:: Optional[str]

class multimeditron.model.modalities.BaseModalityProcessor(config: BaseModalityConfig)¶

Bases: ABC, ProcessorMixin

Abstract base class for modality processors.

The BaseModalityProcessor defines a standard interface for processing inputs of a specific modality. Subclasses must implement the abstract process method.

config¶

Configuration object for the processor.

Type:: BaseModalityConfig

abstractmethod process(modality: Dict[str, Any]) → Dict[str, Any]¶

Abstract method for processing modality.

Parameters:: modality (Dict[str, Any]) – Input data to be processed.
Returns:: The original sample with the processed modality
Return type:: Dict[str, Any]

class multimeditron.model.modalities.BioMedCLIPImageConfig(hidden_size: int = 4096, clip_name: str = '', trust_remote_code: bool = False, projection_type: str = 'mlp', **kwargs)¶

Bases: BaseModalityConfig

Image modality config for OpenCLIP-based models (e.g. BiomedCLIP)

model_type: str = 'meditron_biomedclip'¶

class multimeditron.model.modalities.BioMedCLIPImageModality(config: BioMedCLIPImageConfig)¶

Bases: BaseModality

Image modality backed by BiomedCLIP (OpenCLIP).

config_class¶: alias of BioMedCLIPImageConfig

forward(inputs) → FloatTensor¶: inputs: list[Tensor] each (3, 224, 224)

freeze_modality_embedder()¶: Freeze the parameters of the modality

Danger

This function should an will keep the modality in “eval” mode even if you call torch.nn.Module.train() on the model! To remove the eval mode on the modality you should call multimeditron.model.modalities.base.BaseModality.unfreeze_modality_embedder()

preprocessor_class¶: alias of BioMedCLIPImageProcessor

unfreeze_modality_embedder()¶: Unfreeze the parameters of the modality

unfreeze_projection()¶: Unfreeze the parameters of the projection

class multimeditron.model.modalities.BioMedCLIPImageProcessor(config: BioMedCLIPImageConfig)¶

Bases: BaseModalityProcessor

Image processor using OpenCLIP preprocessing (BiomedCLIP-compatible)

process(modality: Dict[str, Any]) → Dict[str, Any]¶

Abstract method for processing modality.

Parameters:: modality (Dict[str, Any]) – Input data to be processed.
Returns:: The original sample with the processed modality
Return type:: Dict[str, Any]

class multimeditron.model.modalities.ImageConfig(hidden_size: int = 4096, clip_name: str = 'openai/clip-vit-large-patch14', projection_type: str = 'mlp', use_2d_position_ids: bool = False, **kwargs)¶

Bases: BaseModalityConfig

Configuration class for the Image Modality. Extends the BaseModalityConfig.

hidden_size¶

Dimension of the hidden layer for the projection network.

Type:: int

clip_name¶

Name of the CLIP model to use as the feature extractor.

Type:: str

projection_type¶

Type of projection network (e.g., “mlp”).

Type:: str

use_2d_position_ids¶

Whether to use the 2D positional embeddings adaptation for 1D llm without retraining.

Type:: bool

Example

>>> config = ImageConfig(hidden_size=512, clip_name="openai/clip-vit-base-patch32")
>>> print(config.clip_name)
openai/clip-vit-base-patch32

model_type: str = 'meditron_clip'¶

class multimeditron.model.modalities.ImageModality(config: ImageConfig)¶

Bases: BaseModality

config_class¶: alias of ImageConfig

forward(inputs) → FloatTensor¶

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

freeze_modality_embedder()¶: Freeze the parameters of the modality

Danger

This function should an will keep the modality in “eval” mode even if you call torch.nn.Module.train() on the model! To remove the eval mode on the modality you should call multimeditron.model.modalities.base.BaseModality.unfreeze_modality_embedder()

preprocessor_class¶: alias of ImageProcessor

unfreeze_modality_embedder()¶: Unfreeze the parameters of the modality

unfreeze_projection()¶: Unfreeze the parameters of the projection

class multimeditron.model.modalities.ImageProcessor(config)¶

Bases: BaseModalityProcessor

A processor for handling image data. It uses a pretrained CLIP model for processing image inputs into tensors.

image_processor¶

An instance of a pretrained image processor.

Type:: AutoImageProcessor

_num_patches_per_entry¶

The number of patches per image entry, based on image and patch size.

Type:: int

process(modality: Dict[str, Any]) → Dict[str, Any]¶

Processes the input image modality into a tensor suitable for model consumption.

Parameters:: modality (Dict[str, Any]) – The input image data, where “value” is the key for image data.
Returns:: The processed tensor representation of the image.
Return type:: torch.Tensor

class multimeditron.model.modalities.MOEImageConfig(hidden_size: int = 1024, use_bias_proj: bool = True, expert_clip_names: List[str] = [], image_processor: str = 'openai/clip-vit-large-patch14', gating_path: str = '', top_k_experts: int = 1, projection_type: str = 'mlp', generalist_idx: int = -1, fusion_method: str = 'weighted_average', cross_attn_heads: int = 8, **kwargs)¶

Bases: BaseModalityConfig

model_type: str = 'moe_meditron_clip'¶

class multimeditron.model.modalities.MOEImageConfigPEP(hidden_size: int = 4096, use_bias_proj: bool = True, expert_clip_names: List[str] = [], image_processor: str = 'openai/clip-vit-base-patch32', gating_path: str = '', top_k_experts: int = 5, projection_type: str = 'mlp', generalist_idx: int = -1, fusion_method: str = 'weighted_average', cross_attn_heads: int = 8, **kwargs)¶

Bases: BaseModalityConfig

model_type: str = 'moe_meditron_clip_pep'¶

class multimeditron.model.modalities.MOEImageModality(config: MOEImageConfig)¶

Bases: BaseModality

config_class¶: alias of MOEImageConfig

forward(inputs) → Tensor¶

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

freeze_modality_embedder()¶: Freeze the parameters of the modality

Danger

This function should an will keep the modality in “eval” mode even if you call torch.nn.Module.train() on the model! To remove the eval mode on the modality you should call multimeditron.model.modalities.base.BaseModality.unfreeze_modality_embedder()

preprocessor_class¶: alias of MOEImageProcessor

train(mode: bool = True)¶

Set the module in training mode.

Parameters:: mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.
Returns:: self
Return type:: Module

unfreeze_modality_embedder()¶: Unfreeze the parameters of the modality

unfreeze_projection()¶: Unfreeze the parameters of the projection

class multimeditron.model.modalities.MOEImageModalityPEP(config: MOEImageConfigPEP)¶

Bases: BaseModality

Mixture of Experts (MoE) Image Modality using CLIP models as experts. Combines multiple pretrained CLIP models as experts and uses a gating network to select and weight their outputs. Uses Per Expert Projection (PEP) where each expert has its own projection layer. During training, all experts are used and their outputs are weighted by the gating network. During evaluation, only the top-k experts are used (not implemented yet).

config_class¶: alias of MOEImageConfigPEP

property embedding_size: int¶

forward(inputs) → Tensor¶

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

freeze_modality_embedder()¶: Freeze the parameters of the modality

Danger

This function should an will keep the modality in “eval” mode even if you call torch.nn.Module.train() on the model! To remove the eval mode on the modality you should call multimeditron.model.modalities.base.BaseModality.unfreeze_modality_embedder()

preprocessor_class¶: alias of MOEImageProcessorPEP

train(mode: bool = True)¶

Set the module in training mode.

Parameters:: mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.
Returns:: self
Return type:: Module

unfreeze_modality_embedder()¶: Unfreeze the parameters of the modality

unfreeze_projection()¶: Unfreeze the parameters of the projection

class multimeditron.model.modalities.MOEImageProcessor(config: MOEImageConfig)¶

Bases: BaseModalityProcessor

Processor for Mixture of Experts (MoE) Image Modality. Uses a pretrained image processor to convert raw images into pixel values. Prepares processing for the fusion method between experts’ outputs.

process(modality: Dict[str, Any])¶

Abstract method for processing modality.

Parameters:: modality (Dict[str, Any]) – Input data to be processed.
Returns:: The original sample with the processed modality
Return type:: Dict[str, Any]

class multimeditron.model.modalities.MOEImageProcessorPEP(config: MOEImageConfigPEP)¶

Bases: BaseModalityProcessor

Processor for Mixture of Experts (MoE) Image Modality. Per Expert Projection (PEP) version. Uses a pretrained image processor to convert raw images into pixel values.

process(modality: Dict[str, Any])¶

Abstract method for processing modality.

Parameters:: modality (Dict[str, Any]) – Input data to be processed.
Returns:: The original sample with the processed modality
Return type:: Dict[str, Any]