"This is an early v0.1 release with known limitations. It transfers simpler styles (e.g. flat 2D / cel-shaded / monochrome line art), but struggles with more complex styles that involve texture, intricate detail, or strong material/lighting effects.
Quality often improves noticeably if you:
raise CFG to ~1.1 – 2.0 (CFG=1 is the distilled-model default)
use a non-distilled LTX-2.3 model
If I have enough compute, I'll keep working on a better version. Till then, enjoy and happy creating!
An IC-LoRA (in-context LoRA) for LTX-Video 2.3 (22B) trained for image-guided style transfer: given a source video and a single reference image describing the target style, the model re-renders the video in that style while preserving the original content and motion.
Training details
This IC-LoRA was trained on RunPod cloud GPUs.
Base model: Lightricks/LTX-2.3 (22B)
Training framework: ltx-trainer (Lightricks)
Training strategy: video-to-video IC-LoRA (
first_frame_conditioning_p: 0.0, reference latents stream carries style)Released checkpoint: step 8,000
LoRA rank / alpha: 128 / 128
Target modules:
attn1.{to_k,to_q,to_v,to_out.0}+attn2.{to_k,to_q,to_v,to_out.0}(self + cross attention)Optimizer: Prodigy
Scheduler: constant
Mixed precision: bf16
Batch size: 1 (gradient checkpointing on)
Timestep sampling:
shifted_logit_normalResolution: trained at 768x448 @ 97 frames
Dataset: 562 cross-pair samples derived from the Ditto-1M style-transfer dataset (50 styles × ~11 pairs each). Each training reference is constructed by replacing frame 0 of the source video with the stylized first frame of a different pair from the same style
Inference
For inference I used ComfyUI. Workflow available here: Cseti/ComfyUI-Workflows — restyle-ic-lora.
Conditioning — both modalities supported, mixing them works best:
Image reference: a single still image in the requested style, fed as frame 0
Text prompt: e.g.
Make it Disney 2D Animation style./Make it watercolor style.— matches the training caption template (Make it {style} style.).
Strength: 1.0.
Prompting tips
The style reference image carries the primary signal; the text prompt reinforces and disambiguates it. A few patterns that help:
Match the training caption template:
Make it {style} style.— e.g.Make it watercolor style.,Make it Disney 2D Animation style.. The shorter form is the safe default.A more detailed style description can help: Expanding the prompt with technique / medium / palette / lighting cues helps the model toward your intent.
Important Notes
This LoRA is created as part of a research project. The training data is derived from the publicly released Ditto-1M dataset; please respect the licensing terms of the source dataset and any source video content. Users utilize the model at their own risk and are obligated to comply with applicable copyright laws.
Acknowledgement
Special thanks to:
Lightricks for open-sourcing the LTX-2 trainer and the LTX-2.3 22B model
The authors of Ditto-1M for releasing the style-transfer dataset that made this LoRA possible
Support
Training models like this requires renting cloud GPUs, which gets expensive quickly. If you find this LoRA useful and would like me to keep contributing open-source models, your support is very much appreciated:
