πͺ Running mmore on WindowsΒΆ
OverviewΒΆ
mmore was developed and tested mainly on Linux. It runs on Windows too, but a few things behave differently. This page lists those differences and the fix for each one.
If you work on Linux or macOS, you can skip this page.
1. Install the prerequisitesΒΆ
Unlike most Linux distributions, Windows does not ship Python, Git, or FFmpeg. Install them first with winget:
winget install Python.Python.3.11
winget install Git.Git
winget install astral-sh.uv
winget install Gyan.FFmpeg
Then clone the repo and install mmore into a virtual environment:
git clone https://github.com/EPFLiGHT/mmore.git
cd mmore
uv venv
.venv\Scripts\activate
uv pip install -e ".[all,cu126]"
Use cu126 for an NVIDIA GPU, or cpu otherwise. See the
README for the full
list of extras.
2. milvus-lite is not available on WindowsΒΆ
Every example config whose db.uri is ./proc_demo.db relies on milvus-lite
(examples/index/config.yaml, examples/retriever_api/config.yaml,
examples/rag/config.yaml, examples/rag/config_api.yaml). There is no Windows
build of milvus-lite, so any of them fails with:
ModuleNotFoundError: No module named 'milvus_lite'
Fix: run Milvus in DockerΒΆ
This repo ships no Compose file, so download the official Milvus standalone one
matching your installed pymilvus version (see the
Milvus install docs):
# Download the Milvus docker compose file from GitHub
Invoke-WebRequest `
-Uri "https://github.com/milvus-io/milvus/releases/download/v2.6.6/milvus-standalone-docker-compose.yml" `
-OutFile "milvus-docker-compose.yml"
# Start Milvus containers
docker compose -f milvus-docker-compose.yml up -d
Wait about a minute, then check docker ps shows the three containers
(etcd, minio, milvus-standalone) as (healthy).
Create the databaseΒΆ
mmore does not create the database automatically when connecting to a remote Milvus. Run this once:
python -c "from pymilvus import connections, db; connections.connect(uri='http://127.0.0.1:19530'); db.create_database('my_db')"
Point the configs at the Docker instanceΒΆ
The db block lives at a different level depending on the config. Change
uri: ./proc_demo.db to uri: http://127.0.0.1:19530 in each one you use.
examples/retriever_api/config.yaml (and examples/rag/config*.yaml) β db
is at the root:
db:
uri: http://127.0.0.1:19530
name: my_db
examples/index/config.yaml β db is nested under indexer:
indexer:
db:
uri: http://127.0.0.1:19530
name: my_db
Check that the setup worksΒΆ
Once Milvus is running, confirm the connection:
python -c "from pymilvus import MilvusClient; c = MilvusClient(uri='http://127.0.0.1:19530', db_name='my_db'); print(c.list_collections())"
This returns a list of collections (empty before you index anything).
3. Surya OCR can crash the process on large PDFsΒΆ
When processing large PDFs, the surya-based OCR may crash with:
Process finished with exit code 0xC0000005
This is a hard crash inside a native dependency. On Windows, use the fast processors instead, which rely on PyMuPDF rather than surya.
In your process config, use_fast_processors goes under dispatcher_config:
dispatcher_config:
use_fast_processors: true
You lose some accuracy on heavily scanned PDFs, but the pipeline no longer crashes.