multimeditron.model.modalities package

Submodules

multimeditron.model.modalities.base module

multimeditron.model.modalities.image_modality module

class multimeditron.model.modalities.image_modality.ImageConfig(hidden_size: int = 4096, clip_name: str = 'openai/clip-vit-large-patch14', projection_type: str = 'mlp', use_2d_position_ids: bool = False, **kwargs)

Bases: BaseModalityConfig

Configuration class for the Image Modality. Extends the BaseModalityConfig.

hidden_size

Dimension of the hidden layer for the projection network.

Type:

int

clip_name

Name of the CLIP model to use as the feature extractor.

Type:

str

projection_type

Type of projection network (e.g., “mlp”).

Type:

str

use_2d_position_ids

Whether to use the 2D positional embeddings adaptation for 1D llm without retraining.

Type:

bool

Example

>>> config = ImageConfig(hidden_size=512, clip_name="openai/clip-vit-base-patch32")
>>> print(config.clip_name)
openai/clip-vit-base-patch32
model_type: str = 'meditron_clip'
class multimeditron.model.modalities.image_modality.ImageModality(config: ImageConfig)

Bases: BaseModality

config_class

alias of ImageConfig

forward(inputs) FloatTensor

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

freeze_modality_embedder()

Freeze the parameters of the modality

Danger

This function should an will keep the modality in “eval” mode even if you call torch.nn.Module.train() on the model! To remove the eval mode on the modality you should call multimeditron.model.modalities.base.BaseModality.unfreeze_modality_embedder()

classmethod from_dict(config_args, **kwargs)
preprocessor_class

alias of ImageProcessor

unfreeze_modality_embedder()

Unfreeze the parameters of the modality

unfreeze_projection()

Unfreeze the parameters of the projection

class multimeditron.model.modalities.image_modality.ImageProcessor(config)

Bases: BaseModalityProcessor

A processor for handling image data. It uses a pretrained CLIP model for processing image inputs into tensors.

image_processor

An instance of a pretrained image processor.

Type:

AutoImageProcessor

_num_patches_per_entry

The number of patches per image entry, based on image and patch size.

Type:

int

process(modality: Dict[str, Any]) Dict[str, Any]

Processes the input image modality into a tensor suitable for model consumption.

Parameters:

modality (Dict[str, Any]) – The input image data, where “value” is the key for image data.

Returns:

The processed tensor representation of the image.

Return type:

torch.Tensor

multimeditron.model.modalities.image_modality_moe module

class multimeditron.model.modalities.image_modality_moe.MOEImageConfig(hidden_size: int = 1024, use_bias_proj: bool = True, expert_clip_names: List[str] = [], image_processor: str = 'openai/clip-vit-large-patch14', gating_path: str = '', top_k_experts: int = 1, projection_type: str = 'mlp', generalist_idx: int = -1, fusion_method: str = 'weighted_average', cross_attn_heads: int = 8, **kwargs)

Bases: BaseModalityConfig

model_type: str = 'moe_meditron_clip'
class multimeditron.model.modalities.image_modality_moe.MOEImageModality(config: MOEImageConfig)

Bases: BaseModality

Mixture of Experts (MoE) Image Modality using CLIP models as experts. Combines multiple pretrained CLIP models as experts and uses a gating network to select and weight their outputs. During training, all experts are used and their outputs are weighted by the gating network. During evaluation, only the top-k experts are used (not implemented yet).

config_class

alias of MOEImageConfig

forward(inputs) Tensor

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

freeze_modality_embedder()

Freeze the parameters of the modality

Danger

This function should an will keep the modality in “eval” mode even if you call torch.nn.Module.train() on the model! To remove the eval mode on the modality you should call multimeditron.model.modalities.base.BaseModality.unfreeze_modality_embedder()

preprocessor_class

alias of MOEImageProcessor

train(mode: bool = True)

Set the module in training mode.

This has an effect only on certain modules. See the documentation of particular modules for details of their behaviors in training/evaluation mode, i.e., whether they are affected, e.g. Dropout, BatchNorm, etc.

Parameters:

mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.

Returns:

self

Return type:

Module

unfreeze_modality_embedder()

Unfreeze the parameters of the modality

unfreeze_projection()

Unfreeze the parameters of the projection

class multimeditron.model.modalities.image_modality_moe.MOEImageProcessor(config: MOEImageConfig)

Bases: BaseModalityProcessor

Processor for Mixture of Experts (MoE) Image Modality. Uses a pretrained image processor to convert raw images into pixel values. Prepares processing for the fusion method between experts’ outputs.

process(modality: Dict[str, Any])

Abstract method for processing modality.

Parameters:

modality (Dict[str, Any]) – Input data to be processed.

Returns:

The original sample with the processed modality

Return type:

Dict[str, Any]

Module contents

class multimeditron.model.modalities.AutoModality

Bases: object

A class for managing modality registration and retrieval.

The AutoModality class provides a centralized registry for modality subclasses. It handles the registration of modality classes, and allows users to retrieve pretrained models, processors, and configurations for specific modalities.

_registry

Internal dictionary storing registered modality classes, indexed by name.

Type:

dict

classmethod config_from_dict(config: dict, **kwargs) BaseModalityConfig
classmethod from_pretrained(*args, **kwargs) BaseModality
classmethod preprocessor_from_name(name: str, *args, **kwargs) BaseModalityProcessor
classmethod register(name: str)
class multimeditron.model.modalities.BaseModality(config: BaseModalityConfig, dtype: dtype = torch.bfloat16)

Bases: ABC, PreTrainedModel

Abstract base class for modality models.

This base class defines the common interface and attributes for all modality models. Subclasses must implement the abstract methods embedding_size, freeze_modality_embedder, unfreeze_modality_embedder and unfreeze_projection

config

Configuration object for the modality.

Type:

BaseModalityConfig

config_class

Class reference for the configuration.

Type:

type

tokenizer

Tokenizer associated with the model, if any.

Type:

Optional[Any]

_dtype

Data type for the model’s tensors.

Type:

torch.dtype

freeze_all()

Freeze all parameters in the model.

abstractmethod freeze_modality_embedder()

Freeze the parameters of the modality

Danger

This function should an will keep the modality in “eval” mode even if you call torch.nn.Module.train() on the model! To remove the eval mode on the modality you should call multimeditron.model.modalities.base.BaseModality.unfreeze_modality_embedder()

get_config() BaseModalityConfig

Retrieve the configuration object associated with the modality.

Returns:

The configuration object.

Return type:

ModalityConfig

preprocessor_class: type = None
unfreeze_all()

Unfreeze all parameters in the model.

abstractmethod unfreeze_modality_embedder()

Unfreeze the parameters of the modality

abstractmethod unfreeze_projection()

Unfreeze the parameters of the projection

class multimeditron.model.modalities.BaseModalityConfig(hidden_size: int = 1024, modality_type: str | None = None, **kwargs)

Bases: PretrainedConfig

Configuration class for defining modality parameters.

This configuration is used as the base for all modality-specific configurations.

hidden_size

The size of the hidden layers’ representation.

Type:

int

modality_type

The type of modality (e.g., ‘ClipImage’, ‘ClipAudio’).

Type:

Optional[str]

class multimeditron.model.modalities.BaseModalityProcessor(config: BaseModalityConfig)

Bases: ABC, ProcessorMixin

Abstract base class for modality processors.

The BaseModalityProcessor defines a standard interface for processing inputs of a specific modality. Subclasses must implement the abstract process method.

config

Configuration object for the processor.

Type:

BaseModalityConfig

abstractmethod process(modality: Dict[str, Any]) Dict[str, Any]

Abstract method for processing modality.

Parameters:

modality (Dict[str, Any]) – Input data to be processed.

Returns:

The original sample with the processed modality

Return type:

Dict[str, Any]

class multimeditron.model.modalities.ImageConfig(hidden_size: int = 4096, clip_name: str = 'openai/clip-vit-large-patch14', projection_type: str = 'mlp', use_2d_position_ids: bool = False, **kwargs)

Bases: BaseModalityConfig

Configuration class for the Image Modality. Extends the BaseModalityConfig.

hidden_size

Dimension of the hidden layer for the projection network.

Type:

int

clip_name

Name of the CLIP model to use as the feature extractor.

Type:

str

projection_type

Type of projection network (e.g., “mlp”).

Type:

str

use_2d_position_ids

Whether to use the 2D positional embeddings adaptation for 1D llm without retraining.

Type:

bool

Example

>>> config = ImageConfig(hidden_size=512, clip_name="openai/clip-vit-base-patch32")
>>> print(config.clip_name)
openai/clip-vit-base-patch32
model_type: str = 'meditron_clip'
class multimeditron.model.modalities.ImageModality(config: ImageConfig)

Bases: BaseModality

config_class

alias of ImageConfig

forward(inputs) FloatTensor

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

freeze_modality_embedder()

Freeze the parameters of the modality

Danger

This function should an will keep the modality in “eval” mode even if you call torch.nn.Module.train() on the model! To remove the eval mode on the modality you should call multimeditron.model.modalities.base.BaseModality.unfreeze_modality_embedder()

classmethod from_dict(config_args, **kwargs)
preprocessor_class

alias of ImageProcessor

unfreeze_modality_embedder()

Unfreeze the parameters of the modality

unfreeze_projection()

Unfreeze the parameters of the projection

class multimeditron.model.modalities.ImageProcessor(config)

Bases: BaseModalityProcessor

A processor for handling image data. It uses a pretrained CLIP model for processing image inputs into tensors.

image_processor

An instance of a pretrained image processor.

Type:

AutoImageProcessor

_num_patches_per_entry

The number of patches per image entry, based on image and patch size.

Type:

int

process(modality: Dict[str, Any]) Dict[str, Any]

Processes the input image modality into a tensor suitable for model consumption.

Parameters:

modality (Dict[str, Any]) – The input image data, where “value” is the key for image data.

Returns:

The processed tensor representation of the image.

Return type:

torch.Tensor

class multimeditron.model.modalities.MOEImageConfig(hidden_size: int = 1024, use_bias_proj: bool = True, expert_clip_names: List[str] = [], image_processor: str = 'openai/clip-vit-large-patch14', gating_path: str = '', top_k_experts: int = 1, projection_type: str = 'mlp', generalist_idx: int = -1, fusion_method: str = 'weighted_average', cross_attn_heads: int = 8, **kwargs)

Bases: BaseModalityConfig

model_type: str = 'moe_meditron_clip'
class multimeditron.model.modalities.MOEImageConfigPEP(hidden_size: int = 4096, use_bias_proj: bool = True, expert_clip_names: List[str] = [], image_processor: str = 'openai/clip-vit-base-patch32', gating_path: str = '', top_k_experts: int = 5, projection_type: str = 'mlp', generalist_idx: int = -1, fusion_method: str = 'weighted_average', cross_attn_heads: int = 8, **kwargs)

Bases: BaseModalityConfig

model_type: str = 'moe_meditron_clip_pep'
class multimeditron.model.modalities.MOEImageModality(config: MOEImageConfig)

Bases: BaseModality

Mixture of Experts (MoE) Image Modality using CLIP models as experts. Combines multiple pretrained CLIP models as experts and uses a gating network to select and weight their outputs. During training, all experts are used and their outputs are weighted by the gating network. During evaluation, only the top-k experts are used (not implemented yet).

config_class

alias of MOEImageConfig

forward(inputs) Tensor

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

freeze_modality_embedder()

Freeze the parameters of the modality

Danger

This function should an will keep the modality in “eval” mode even if you call torch.nn.Module.train() on the model! To remove the eval mode on the modality you should call multimeditron.model.modalities.base.BaseModality.unfreeze_modality_embedder()

preprocessor_class

alias of MOEImageProcessor

train(mode: bool = True)

Set the module in training mode.

This has an effect only on certain modules. See the documentation of particular modules for details of their behaviors in training/evaluation mode, i.e., whether they are affected, e.g. Dropout, BatchNorm, etc.

Parameters:

mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.

Returns:

self

Return type:

Module

unfreeze_modality_embedder()

Unfreeze the parameters of the modality

unfreeze_projection()

Unfreeze the parameters of the projection

class multimeditron.model.modalities.MOEImageModalityPEP(config: MOEImageConfigPEP)

Bases: BaseModality

Mixture of Experts (MoE) Image Modality using CLIP models as experts. Combines multiple pretrained CLIP models as experts and uses a gating network to select and weight their outputs. Uses Per Expert Projection (PEP) where each expert has its own projection layer. During training, all experts are used and their outputs are weighted by the gating network. During evaluation, only the top-k experts are used (not implemented yet).

config_class

alias of MOEImageConfigPEP

property embedding_size: int
forward(inputs) Tensor

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

freeze_modality_embedder()

Freeze the parameters of the modality

Danger

This function should an will keep the modality in “eval” mode even if you call torch.nn.Module.train() on the model! To remove the eval mode on the modality you should call multimeditron.model.modalities.base.BaseModality.unfreeze_modality_embedder()

preprocessor_class

alias of MOEImageProcessorPEP

train(mode: bool = True)

Set the module in training mode.

This has an effect only on certain modules. See the documentation of particular modules for details of their behaviors in training/evaluation mode, i.e., whether they are affected, e.g. Dropout, BatchNorm, etc.

Parameters:

mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.

Returns:

self

Return type:

Module

unfreeze_modality_embedder()

Unfreeze the parameters of the modality

unfreeze_projection()

Unfreeze the parameters of the projection

class multimeditron.model.modalities.MOEImageProcessor(config: MOEImageConfig)

Bases: BaseModalityProcessor

Processor for Mixture of Experts (MoE) Image Modality. Uses a pretrained image processor to convert raw images into pixel values. Prepares processing for the fusion method between experts’ outputs.

process(modality: Dict[str, Any])

Abstract method for processing modality.

Parameters:

modality (Dict[str, Any]) – Input data to be processed.

Returns:

The original sample with the processed modality

Return type:

Dict[str, Any]

class multimeditron.model.modalities.MOEImageProcessorPEP(config: MOEImageConfigPEP)

Bases: BaseModalityProcessor

Processor for Mixture of Experts (MoE) Image Modality. Per Expert Projection (PEP) version. Uses a pretrained image processor to convert raw images into pixel values.

process(modality: Dict[str, Any])

Abstract method for processing modality.

Parameters:

modality (Dict[str, Any]) – Input data to be processed.

Returns:

The original sample with the processed modality

Return type:

Dict[str, Any]