Sign In

WAN 2.2 I2V GGUF — My 8GB Daily Workflow | Upscale + RIFE

Updated: Mar 29, 2026

toolcomfyui8gbggufi2vimage-to-video

Download

1 variant available

Archive Other

6.76 KB

Verified:

Type

Workflows

Stats

843

Reviews

Published

Mar 21, 2026

Base Model

Wan Video 2.2 I2V-A14B

Hash

AutoV2
9B72649C7F
default creator card background decoration
Antarez's Avatar

Antarez

This is the ComfyUI workflow I use daily on my RTX 5060 8GB. I'm sharing it because getting WAN 2.2 to produce consistent quality at this VRAM level took quite a bit of trial and error, and I figure some of that research might save others time. These settings work well for me — your results may vary depending on your GPU, drivers and system setup, but this is what I keep coming back to.

Also available as a 1-Click Pack — includes auto-installer, exclusive custom nodes, all models, and all dependencies ready to run. No downloads, no terminal, no debugging. Extract → Run → Generate. Available on my Ko-fi

Free workflow:              1-Click Pack:
✓ Workflow JSON             ✓ Everything included
✗ 7 standard nodes          ✓ Exclusive custom nodes
✗ Standard nodes            ✓ Auto-installed
✗ Download 5 models         ✓ Auto-downloaded
✗ Configure paths           ✓ Pre-configured
✗ Debug compatibility       ✓ Tested & working
✗ ~2hrs setup               ✓ 5 min extract & run

What it does

Two-pass generation using the WAN 2.2 Remix V2.1 Q3_K_M GGUF models, followed by a post-process pipeline that upscales and smooths the output. The final video comes out at around 5 seconds, 2x the native resolution, with RIFE frame interpolation for smoother motion.

Honest about timing: on my RTX 5060 8GB with 32GB RAM the full pipeline takes around 10 minutes per video. The quality I get out of it is pretty decent for this class of hardware — good enough that I kept using it. If you want faster results and don't need the upscale or RIFE, skipping those brings generation time down to around 400 seconds.

What you need

System: 8GB VRAM minimum, 32GB RAM recommended (the model offloads to RAM between passes).

Models:

  • wan22RemixT2VI2V_i2vHighV21-Q3_K_M.gguf

  • wan22RemixT2VI2V_i2vLowV21-Q3_K_M.gguf

  • umt5-xxl-encoder-Q5_K_M.gguf

  • wan_2.1_vae.safetensors

  • 4x-UltraSharp.pth

Custom nodes: ComfyUI-GGUF, KJNodes, ComfyUI-WanVideoWrapper, VideoHelperSuite, ComfyUI_Fill-Nodes, Comfy-WaveSpeed, rgthree-comfy

Custom nodes can be installed via ComfyUI Manager → Install Missing Nodes after loading the workflow.

How to use

  1. Drag the JSON into ComfyUI or use Load

  2. In node 106 load your input image

  3. In node 6 write your animation prompt

  4. Queue and wait

Any image orientation works — the workflow auto-adapts to portrait, landscape or square. Very dark images or images with busy backgrounds tend to produce less consistent results.

Why these settings

euler on HIGH, heun on LOW. This specific sampler pairing handles Q3_K_M quantization significantly better than other combinations I tested — most multi-step samplers accumulate precision errors that show as visible pixelation by step 4-5. Finding this took extensive testing across most available options.

CFG is split: 1.5 on HIGH, 1.0 on LOW. The split prevents shadow artifacts that appear with uniform CFG on Q3 models while maintaining facial structure consistency. More steps go to LOW (4) than HIGH (2) because of how WAN 2.2's architecture handles facial identity.

Post-process pipeline:

VAEDecode → FastUnsharpSharpen (0.4) → 4x-UltraSharp → ×0.5 scale → RIFE ×2 → H264 CRF17 @ 32fps

Net 2x upscale plus doubled frames. If you want even smoother output, change the RIFE multiplier in node 310 from 2 to 4 — no extra VRAM needed since RIFE runs on already-decoded frames.

NAG at scale 11, alpha 0.25, tau 2.5 — works through attention normalization so it doesn't trigger the shadow artifacts you'd get from pushing CFG higher. FBCache at threshold 0.12 for speed with minimal quality impact.

RTX 5000 series (Blackwell) — read this

SageAttention is patched on both models in auto mode. Don't change this to an explicit backend. On RTX 5060, 5070, 5080 and 5090 (sm_120 compute capability) the sageattn_qk_int8_pv_fp16_cuda backend crashes natively — those CUDA kernels haven't been compiled for Blackwell yet. auto detects the right implementation at runtime and works correctly.

If you're on an RTX 5000 series card, also use PyTorch cu130 (CUDA 13.0, compiled for sm_120) and launch ComfyUI with --disable-async-offload. The cu128 build underperforms on Blackwell and async offload has some instability on sm_120.

NVFP4 quantized WAN models are not available yet but when they are, Blackwell's FP4 tensor cores will handle them natively — expect a significant quality and speed uplift for this GPU family when that happens.

Variants

+ LightX2V — if you want faster generation at the cost of some quality, add the LightX2V LoRAs on top of this workflow:

  • HIGH: wan2.2_i2v_A14b_high_noise_lora_rank64_lightx2v_4step_1022.safetensors at strength 1.5

  • LOW: wan2.2_i2v_A14b_low_noise_lora_rank64_lightx2v_4step_1022.safetensors at strength 1.0

I find the base workflow without LoRAs gives better quality at the same step count, but LightX2V is useful when you want faster iteration.

+ SLG Face Lock — I've also written a custom node (SkipLayerGuidanceWAN) that adds zero-cost face lock on the LOW pass. With heun the difference is subtle since the corrector already stabilizes identity reasonably well, but it's there if you want it. I'll publish it as a separate post.

Want this workflow without the setup? The 1-Click Pack includes everything — auto-installer, models, nodes, optimized for 8GB. → Ko-fi

What didn't work

Noting these because they come up a lot and I spent time on all of them: res_2m and res_3s both pixelate from quantization error accumulation, uniform CFG 3.0 creates heavy shadow artifacts with Q3, MagCache throws a division by zero with heun and beta scheduler, karras scheduler is designed for noise-prediction architectures and doesn't fit WAN's flow matching, and the explicit SageAttention CUDA backend crashes on Blackwell.

Note on the negative prompt

Please don't remove the Chinese-language terms in the negative prompt. They're part of WAN's official negative conditioning and work at the model level, not just as text guidance. Removing them noticeably affects output quality.

v2.0 — 8 LoRA slots (rgthree), FastUnsharpSharpen, auto-orientation, fixed negative prompt + NAG conditioning, resolution corrected to 480×832
v1.0 — initial release

If something doesn't work for you or you get better results with different settings, drop it in the comments. Always curious what others are finding on different hardware.