multimeditron.model.modalities package¶
Submodules¶
multimeditron.model.modalities.base module¶
multimeditron.model.modalities.image_modality module¶
- class multimeditron.model.modalities.image_modality.ImageConfig(hidden_size: int = 4096, clip_name: str = 'openai/clip-vit-large-patch14', projection_type: str = 'mlp', use_2d_position_ids: bool = False, **kwargs)¶
Bases:
BaseModalityConfigConfiguration class for the Image Modality. Extends the BaseModalityConfig.
Dimension of the hidden layer for the projection network.
- Type:
int
- clip_name¶
Name of the CLIP model to use as the feature extractor.
- Type:
str
- projection_type¶
Type of projection network (e.g., “mlp”).
- Type:
str
- use_2d_position_ids¶
Whether to use the 2D positional embeddings adaptation for 1D llm without retraining.
- Type:
bool
Example
>>> config = ImageConfig(hidden_size=512, clip_name="openai/clip-vit-base-patch32") >>> print(config.clip_name) openai/clip-vit-base-patch32
- model_type: str = 'meditron_clip'¶
- class multimeditron.model.modalities.image_modality.ImageModality(config: ImageConfig)¶
Bases:
BaseModality- config_class¶
alias of
ImageConfig
- forward(inputs) FloatTensor¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- freeze_modality_embedder()¶
Freeze the parameters of the modality
Danger
This function should an will keep the modality in “eval” mode even if you call
torch.nn.Module.train()on the model! To remove the eval mode on the modality you should callmultimeditron.model.modalities.base.BaseModality.unfreeze_modality_embedder()
- classmethod from_dict(config_args, **kwargs)¶
- preprocessor_class¶
alias of
ImageProcessor
- unfreeze_modality_embedder()¶
Unfreeze the parameters of the modality
- unfreeze_projection()¶
Unfreeze the parameters of the projection
- class multimeditron.model.modalities.image_modality.ImageProcessor(config)¶
Bases:
BaseModalityProcessorA processor for handling image data. It uses a pretrained CLIP model for processing image inputs into tensors.
- image_processor¶
An instance of a pretrained image processor.
- Type:
AutoImageProcessor
- _num_patches_per_entry¶
The number of patches per image entry, based on image and patch size.
- Type:
int
- process(modality: Dict[str, Any]) Dict[str, Any]¶
Processes the input image modality into a tensor suitable for model consumption.
- Parameters:
modality (Dict[str, Any]) – The input image data, where “value” is the key for image data.
- Returns:
The processed tensor representation of the image.
- Return type:
torch.Tensor
multimeditron.model.modalities.image_modality_moe module¶
- class multimeditron.model.modalities.image_modality_moe.MOEImageConfig(hidden_size: int = 1024, use_bias_proj: bool = True, expert_clip_names: List[str] = [], image_processor: str = 'openai/clip-vit-large-patch14', gating_path: str = '', top_k_experts: int = 1, projection_type: str = 'mlp', generalist_idx: int = -1, fusion_method: str = 'weighted_average', cross_attn_heads: int = 8, **kwargs)¶
Bases:
BaseModalityConfig- model_type: str = 'moe_meditron_clip'¶
- class multimeditron.model.modalities.image_modality_moe.MOEImageModality(config: MOEImageConfig)¶
Bases:
BaseModalityMixture of Experts (MoE) Image Modality using CLIP models as experts. Combines multiple pretrained CLIP models as experts and uses a gating network to select and weight their outputs. During training, all experts are used and their outputs are weighted by the gating network. During evaluation, only the top-k experts are used (not implemented yet).
- config_class¶
alias of
MOEImageConfig
- forward(inputs) Tensor¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- freeze_modality_embedder()¶
Freeze the parameters of the modality
Danger
This function should an will keep the modality in “eval” mode even if you call
torch.nn.Module.train()on the model! To remove the eval mode on the modality you should callmultimeditron.model.modalities.base.BaseModality.unfreeze_modality_embedder()
- preprocessor_class¶
alias of
MOEImageProcessor
- train(mode: bool = True)¶
Set the module in training mode.
This has an effect only on certain modules. See the documentation of particular modules for details of their behaviors in training/evaluation mode, i.e., whether they are affected, e.g.
Dropout,BatchNorm, etc.- Parameters:
mode (bool) – whether to set training mode (
True) or evaluation mode (False). Default:True.- Returns:
self
- Return type:
Module
- unfreeze_modality_embedder()¶
Unfreeze the parameters of the modality
- unfreeze_projection()¶
Unfreeze the parameters of the projection
- class multimeditron.model.modalities.image_modality_moe.MOEImageProcessor(config: MOEImageConfig)¶
Bases:
BaseModalityProcessorProcessor for Mixture of Experts (MoE) Image Modality. Uses a pretrained image processor to convert raw images into pixel values. Prepares processing for the fusion method between experts’ outputs.
- process(modality: Dict[str, Any])¶
Abstract method for processing modality.
- Parameters:
modality (Dict[str, Any]) – Input data to be processed.
- Returns:
The original sample with the processed modality
- Return type:
Dict[str, Any]
Module contents¶
- class multimeditron.model.modalities.AutoModality¶
Bases:
objectA class for managing modality registration and retrieval.
The AutoModality class provides a centralized registry for modality subclasses. It handles the registration of modality classes, and allows users to retrieve pretrained models, processors, and configurations for specific modalities.
- _registry¶
Internal dictionary storing registered modality classes, indexed by name.
- Type:
dict
- classmethod config_from_dict(config: dict, **kwargs) BaseModalityConfig¶
- classmethod from_pretrained(*args, **kwargs) BaseModality¶
- classmethod preprocessor_from_name(name: str, *args, **kwargs) BaseModalityProcessor¶
- classmethod register(name: str)¶
- class multimeditron.model.modalities.BaseModality(config: BaseModalityConfig, dtype: dtype = torch.bfloat16)¶
Bases:
ABC,PreTrainedModelAbstract base class for modality models.
This base class defines the common interface and attributes for all modality models. Subclasses must implement the abstract methods embedding_size, freeze_modality_embedder, unfreeze_modality_embedder and unfreeze_projection
- config¶
Configuration object for the modality.
- Type:
- config_class¶
Class reference for the configuration.
- Type:
type
- tokenizer¶
Tokenizer associated with the model, if any.
- Type:
Optional[Any]
- _dtype¶
Data type for the model’s tensors.
- Type:
torch.dtype
- freeze_all()¶
Freeze all parameters in the model.
- abstractmethod freeze_modality_embedder()¶
Freeze the parameters of the modality
Danger
This function should an will keep the modality in “eval” mode even if you call
torch.nn.Module.train()on the model! To remove the eval mode on the modality you should callmultimeditron.model.modalities.base.BaseModality.unfreeze_modality_embedder()
- get_config() BaseModalityConfig¶
Retrieve the configuration object associated with the modality.
- Returns:
The configuration object.
- Return type:
ModalityConfig
- preprocessor_class: type = None¶
- unfreeze_all()¶
Unfreeze all parameters in the model.
- abstractmethod unfreeze_modality_embedder()¶
Unfreeze the parameters of the modality
- abstractmethod unfreeze_projection()¶
Unfreeze the parameters of the projection
- class multimeditron.model.modalities.BaseModalityConfig(hidden_size: int = 1024, modality_type: str | None = None, **kwargs)¶
Bases:
PretrainedConfigConfiguration class for defining modality parameters.
This configuration is used as the base for all modality-specific configurations.
The size of the hidden layers’ representation.
- Type:
int
- modality_type¶
The type of modality (e.g., ‘ClipImage’, ‘ClipAudio’).
- Type:
Optional[str]
- class multimeditron.model.modalities.BaseModalityProcessor(config: BaseModalityConfig)¶
Bases:
ABC,ProcessorMixinAbstract base class for modality processors.
The BaseModalityProcessor defines a standard interface for processing inputs of a specific modality. Subclasses must implement the abstract process method.
- config¶
Configuration object for the processor.
- Type:
- abstractmethod process(modality: Dict[str, Any]) Dict[str, Any]¶
Abstract method for processing modality.
- Parameters:
modality (Dict[str, Any]) – Input data to be processed.
- Returns:
The original sample with the processed modality
- Return type:
Dict[str, Any]
- class multimeditron.model.modalities.ImageConfig(hidden_size: int = 4096, clip_name: str = 'openai/clip-vit-large-patch14', projection_type: str = 'mlp', use_2d_position_ids: bool = False, **kwargs)¶
Bases:
BaseModalityConfigConfiguration class for the Image Modality. Extends the BaseModalityConfig.
Dimension of the hidden layer for the projection network.
- Type:
int
- clip_name¶
Name of the CLIP model to use as the feature extractor.
- Type:
str
- projection_type¶
Type of projection network (e.g., “mlp”).
- Type:
str
- use_2d_position_ids¶
Whether to use the 2D positional embeddings adaptation for 1D llm without retraining.
- Type:
bool
Example
>>> config = ImageConfig(hidden_size=512, clip_name="openai/clip-vit-base-patch32") >>> print(config.clip_name) openai/clip-vit-base-patch32
- model_type: str = 'meditron_clip'¶
- class multimeditron.model.modalities.ImageModality(config: ImageConfig)¶
Bases:
BaseModality- config_class¶
alias of
ImageConfig
- forward(inputs) FloatTensor¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- freeze_modality_embedder()¶
Freeze the parameters of the modality
Danger
This function should an will keep the modality in “eval” mode even if you call
torch.nn.Module.train()on the model! To remove the eval mode on the modality you should callmultimeditron.model.modalities.base.BaseModality.unfreeze_modality_embedder()
- classmethod from_dict(config_args, **kwargs)¶
- preprocessor_class¶
alias of
ImageProcessor
- unfreeze_modality_embedder()¶
Unfreeze the parameters of the modality
- unfreeze_projection()¶
Unfreeze the parameters of the projection
- class multimeditron.model.modalities.ImageProcessor(config)¶
Bases:
BaseModalityProcessorA processor for handling image data. It uses a pretrained CLIP model for processing image inputs into tensors.
- image_processor¶
An instance of a pretrained image processor.
- Type:
AutoImageProcessor
- _num_patches_per_entry¶
The number of patches per image entry, based on image and patch size.
- Type:
int
- process(modality: Dict[str, Any]) Dict[str, Any]¶
Processes the input image modality into a tensor suitable for model consumption.
- Parameters:
modality (Dict[str, Any]) – The input image data, where “value” is the key for image data.
- Returns:
The processed tensor representation of the image.
- Return type:
torch.Tensor
- class multimeditron.model.modalities.MOEImageConfig(hidden_size: int = 1024, use_bias_proj: bool = True, expert_clip_names: List[str] = [], image_processor: str = 'openai/clip-vit-large-patch14', gating_path: str = '', top_k_experts: int = 1, projection_type: str = 'mlp', generalist_idx: int = -1, fusion_method: str = 'weighted_average', cross_attn_heads: int = 8, **kwargs)¶
Bases:
BaseModalityConfig- model_type: str = 'moe_meditron_clip'¶
- class multimeditron.model.modalities.MOEImageConfigPEP(hidden_size: int = 4096, use_bias_proj: bool = True, expert_clip_names: List[str] = [], image_processor: str = 'openai/clip-vit-base-patch32', gating_path: str = '', top_k_experts: int = 5, projection_type: str = 'mlp', generalist_idx: int = -1, fusion_method: str = 'weighted_average', cross_attn_heads: int = 8, **kwargs)¶
Bases:
BaseModalityConfig- model_type: str = 'moe_meditron_clip_pep'¶
- class multimeditron.model.modalities.MOEImageModality(config: MOEImageConfig)¶
Bases:
BaseModalityMixture of Experts (MoE) Image Modality using CLIP models as experts. Combines multiple pretrained CLIP models as experts and uses a gating network to select and weight their outputs. During training, all experts are used and their outputs are weighted by the gating network. During evaluation, only the top-k experts are used (not implemented yet).
- config_class¶
alias of
MOEImageConfig
- forward(inputs) Tensor¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- freeze_modality_embedder()¶
Freeze the parameters of the modality
Danger
This function should an will keep the modality in “eval” mode even if you call
torch.nn.Module.train()on the model! To remove the eval mode on the modality you should callmultimeditron.model.modalities.base.BaseModality.unfreeze_modality_embedder()
- preprocessor_class¶
alias of
MOEImageProcessor
- train(mode: bool = True)¶
Set the module in training mode.
This has an effect only on certain modules. See the documentation of particular modules for details of their behaviors in training/evaluation mode, i.e., whether they are affected, e.g.
Dropout,BatchNorm, etc.- Parameters:
mode (bool) – whether to set training mode (
True) or evaluation mode (False). Default:True.- Returns:
self
- Return type:
Module
- unfreeze_modality_embedder()¶
Unfreeze the parameters of the modality
- unfreeze_projection()¶
Unfreeze the parameters of the projection
- class multimeditron.model.modalities.MOEImageModalityPEP(config: MOEImageConfigPEP)¶
Bases:
BaseModalityMixture of Experts (MoE) Image Modality using CLIP models as experts. Combines multiple pretrained CLIP models as experts and uses a gating network to select and weight their outputs. Uses Per Expert Projection (PEP) where each expert has its own projection layer. During training, all experts are used and their outputs are weighted by the gating network. During evaluation, only the top-k experts are used (not implemented yet).
- config_class¶
alias of
MOEImageConfigPEP
- property embedding_size: int¶
- forward(inputs) Tensor¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Moduleinstance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- freeze_modality_embedder()¶
Freeze the parameters of the modality
Danger
This function should an will keep the modality in “eval” mode even if you call
torch.nn.Module.train()on the model! To remove the eval mode on the modality you should callmultimeditron.model.modalities.base.BaseModality.unfreeze_modality_embedder()
- preprocessor_class¶
alias of
MOEImageProcessorPEP
- train(mode: bool = True)¶
Set the module in training mode.
This has an effect only on certain modules. See the documentation of particular modules for details of their behaviors in training/evaluation mode, i.e., whether they are affected, e.g.
Dropout,BatchNorm, etc.- Parameters:
mode (bool) – whether to set training mode (
True) or evaluation mode (False). Default:True.- Returns:
self
- Return type:
Module
- unfreeze_modality_embedder()¶
Unfreeze the parameters of the modality
- unfreeze_projection()¶
Unfreeze the parameters of the projection
- class multimeditron.model.modalities.MOEImageProcessor(config: MOEImageConfig)¶
Bases:
BaseModalityProcessorProcessor for Mixture of Experts (MoE) Image Modality. Uses a pretrained image processor to convert raw images into pixel values. Prepares processing for the fusion method between experts’ outputs.
- process(modality: Dict[str, Any])¶
Abstract method for processing modality.
- Parameters:
modality (Dict[str, Any]) – Input data to be processed.
- Returns:
The original sample with the processed modality
- Return type:
Dict[str, Any]
- class multimeditron.model.modalities.MOEImageProcessorPEP(config: MOEImageConfigPEP)¶
Bases:
BaseModalityProcessorProcessor for Mixture of Experts (MoE) Image Modality. Per Expert Projection (PEP) version. Uses a pretrained image processor to convert raw images into pixel values.
- process(modality: Dict[str, Any])¶
Abstract method for processing modality.
- Parameters:
modality (Dict[str, Any]) – Input data to be processed.
- Returns:
The original sample with the processed modality
- Return type:
Dict[str, Any]