Quick Start¶

This page walks through the most common MMIRAGE workflows step by step.

Text-Only: Reformatting a Dataset¶

Suppose you have a JSONL dataset where each sample looks like:

{
    "conversations": [
        {"role": "user", "content": "Describe the image"},
        {"role": "assistant", "content": "This is a badly formmatted answer"}
    ],
    "modalities": ["<the images>"]
}

You want to use an LLM to reformat the assistant answer in Markdown. Create a YAML config:

processors:
  - type: llm
    server_args:
      model_path: Qwen/Qwen3-8B
      tp_size: 4
      trust_remote_code: true
    default_sampling_params:
      temperature: 0.1
      top_p: 1.0
      max_new_tokens: 384

loading_params:
  state_dir: /path/to/state
  datasets:
    - path: /path/to/dataset.jsonl
      type: JSONL
      output_dir: /path/to/output/shards
  num_shards: 4
  shard_id: "$SLURM_ARRAY_TASK_ID"
  batch_size: 64

processing_params:
  inputs:
    - name: assistant_answer
      key: conversations[1].content
    - name: user_prompt
      key: conversations[0].content
    - name: modalities
      key: modalities

  outputs:
    - name: formatted_answer
      type: llm
      output_type: plain
      prompt: |
        Reformat the answer in a markdown format without adding anything else:
        {{ assistant_answer }}

  remove_columns: false
  output_schema:
    conversations:
      - role: user
        content: "{{ user_prompt }}"
      - role: assistant
        content: "{{ formatted_answer }}"
    modalities: "{{ modalities }}"

execution_params:
  mode: local
  retry: false
  merge: false

Then run:

mmirage run --config configs/my_config.yaml

Multimodal: Processing Images with a VLM¶

For vision-language tasks, add image inputs and specify a chat_template:

processors:
  - type: llm
    server_args:
      model_path: Qwen/Qwen2-VL-7B-Instruct
      tp_size: 4
      trust_remote_code: true
    chat_template: qwen2-vl
    default_sampling_params:
      temperature: 0.1
      top_p: 0.95
      max_new_tokens: 768

loading_params:
  datasets:
    - path: /path/to/image/dataset
      type: loadable
      output_dir: /path/to/output/shards
      image_base_path: /path/to/images
  num_shards: 4
  shard_id: "$SLURM_ARRAY_TASK_ID"
  batch_size: 16

processing_params:
  inputs:
    - name: image
      key: image_path
      type: image
    - name: question
      key: question

  outputs:
    - name: answer
      type: llm
      output_type: plain
      prompt: |
        Answer this question about the image:
        {{ question }}

  output_schema:
    question: "{{ question }}"
    answer: "{{ answer }}"

execution_params:
  mode: local
  retry: true
  merge: true

Running on SLURM¶

Set execution_params.mode: slurm and provide SLURM-specific parameters:

execution_params:
  mode: slurm
  account: my_account
  job_name: mmirage-job
  nodes: 1
  ntasks_per_node: 1
  gpus: 4
  cpus_per_task: 64
  time_limit: "11:59:59"
  retry: true
  merge: true
  max_retries: 3

Then submit:

mmirage run --config configs/slurm_config.yaml

See CLI Reference for all available subcommands and Configuration Reference for a full parameter guide.