EZ Wan2.1 LoRA Training Guide - AI-Toolkit - Local - Image dataset

This guide mainly focused on getting the LoRA trainer works on your PC locally using your own GPU. I have a RTX 5090 so my suggested settings might not work on your GPU with less VRAM. If you need more information on how to prepare your dataset and more technical issues, please try ChatGPT and google as I'm still learning about this.

1. Installation:

open cmd
git clone https://github.com/ostris/ai-toolkit.git
pip install -U "triton-windows<3.4"
pip install huggingface_hub hf_transfer
open powershell, set HF_HUB_ENABLE_HF_TRANSFER=1 , close powershell
back to cmd
- cd ai-toolkit
- python -m venv venv
- call venv\Scripts\activate.bat
- python -m pip install -U pip
  - RTX-50 xx [Cuda 12.8]
    - pip install --no-cache-dir torch==2.7.0 torchvision==0.22.0 --index-url https://download.pytorch.org/whl/cu128
  - RTX-40/30/20/GTX 16/10 (CUDA 12.6)
    -pip install --no-cache-dir torch==2.7.0 torchvision==0.22.0 --index-url https://download.pytorch.org/whl/cu126
- pip install -r requirements.txt
- cd ui
- npm install
- keep the cmd window open for later

2. Download Wan2.1 Model (gonna take hours):

https://huggingface.co/Wan-AI/Wan2.1-T2V-14B-Diffusers/tree/main

If you don't know how to download the model using huggingface CLI, use the code below and create a python file then run it. It will download all the files on huggingface. If you are doing it manually, you MUST download all the files and maintain the folder structure.

Replace PATH (where you want the model saved)
Replace YOUR_Token (you can get one from logging in from huggingface)

Code below

from huggingface_hub import snapshot_download

snapshot_download(

repo_id="Wan-AI/Wan2.1-T2V-14B-Diffusers",

local_dir=r"PATH",

local_dir_use_symlinks=False,

resume_download=True,

token="YOUR_Token"

)

Code above

It will take some hours to download everything. If it looks frozen, its not. Let it run until you see the message "download completed".

3. Run:

back to cmd
npm run build_and_start
open http://192.168.1.207:8675 or the ip address shown in CMD.

4. Dataset (image):

15 to 20 images. keep subject in the middle when cropping. Caption you can use ChatGTP for quick output (only works for SFW image). If NSFW, use joycaption (https://github.com/fpgaminer/joycaption).
You can use short videos too if you want to make motion LoRA, but you will need a beefy GPU (4090 or 5090) and it will take hours to completed. For caption use Google Gemini (SFW). For NSFW i'm not sure. Comment below if you know how.

5. Training settings:

Change these settings according to the screenshot. Replace PATH, this is where your saved wan2.1 models folder.
Scroll down and change the prompt to what you are training. The trainer will create preview of the lora during training. Stop the training if you like the result.
When all set, click Create Job and then press the PLAY button on top right to start training.

Note:

if you want to kill the server or restart it when something went wrong, you can do it in Task manager and then look for Node.js and end those tasks.