Updated: May 9, 2026
characterThis ComfyUI workflow is designed for LTX-2 image-to-video generation with a distilled acceleration route, stronger frame-to-frame consistency, and 1080p-class high-resolution output. The main goal of this workflow is to turn one input image into a stable short video while preserving the original subject identity, scene layout, lighting direction, camera framing, and overall visual style across the generated frames.
Compared with a heavier non-distilled LTX-2 route, this workflow focuses on a more efficient distilled generation setup. It still uses the LTX-2 Dev model backbone, but adds a distilled LoRA route to improve speed and sampling efficiency. This makes the workflow more suitable for repeated testing, online generation, RunningHub deployment, prompt iteration, and short video production where users need both visual quality and faster turnaround.
The workflow is built around LTX-2 19B Dev FP8, using ltx-2-19b-dev-fp8.safetensors as the main checkpoint and gemma_3_12B_it.safetensors as the text encoder. It also uses LTX-2 spatial latent upscaling through ltx-2-spatial-upscaler-x2-1.0.safetensors. The distilled behavior is introduced through the LTX-2 distilled LoRA route, allowing the workflow to run with a more compact sampling structure while still keeping the LTX-2 visual generation pipeline.
The core logic is image-to-video consistency. A source image is loaded, resized, preprocessed, injected into the video latent, sampled through LTX-2, spatially upscaled in latent space, refined again, and finally decoded into video frames. This makes the workflow more advanced than a simple one-pass image-to-video graph. It is built as a multi-stage LTX-2 I2V pipeline for creators who want more stable short clips instead of unstable single-pass motion.
The input stage starts from a still image. The image is prepared through ImageResizeKJv2 and ResizeImagesByLongerEdge. The workflow is designed around high-resolution output logic, with a 1080p-class route such as 1920 x 1088 available for stronger GPUs. The workflow note also explains that width and height should follow LTX-2 valid size rules, and the frame count should follow the “divisible by 8 plus 1” rule. If invalid values are used, the workflow may silently choose the closest valid parameters, so correct resolution and frame-count planning is important.
The frame calculation stage uses a calculator-style rule to keep the video length compatible with LTX-2. This is important because LTX-2 is sensitive to frame structure. A 60-frame-class video must still follow the internal frame-count constraints. This workflow therefore uses a structured frame-count route rather than arbitrary video length values, making it more suitable for predictable image-to-video generation.
The image is passed through LTXVPreprocess before being injected into the latent video. This preprocessing step prepares the source image for the LTX-2 latent pipeline and helps control image compression behavior. EmptyLTXVLatentVideo creates the base video latent with the selected width, height, length, and batch size. LTXVImgToVideoInplace then places the source image into the video latent, giving the model a strong visual anchor. This is the key step for preserving the original image identity.
The workflow also uses LTXVEmptyLatentAudio, LTXVConcatAVLatent, and LTXVSeparateAVLatent. This audio-video latent structure allows the workflow to operate inside the broader LTX-2 AV latent system. Even when the main task is image-to-video consistency, the audio latent structure remains part of the pipeline. The video latent and audio latent are combined during sampling and separated again before decoding, which makes the graph closer to a complete LTX-2 workflow rather than a simplified video-only route.
The prompt section uses CLIPTextEncode and LTXVConditioning. The positive prompt should describe the subject, motion, scene behavior, camera movement, lighting, atmosphere, and any dialogue or audio-related action if needed. LTX-2 prompting works better when the prompt describes events over time rather than only describing a static image. For example, instead of only writing “a man and woman eating hot pot,” a better prompt describes who moves, who speaks, what the camera does, and how the scene changes over the clip.
The negative prompt is detailed and designed to suppress common video-generation problems. It targets blur, overexposure, underexposure, low contrast, flickering, motion blur, distorted proportions, deformed facial features, wrong hand count, incorrect text, missing microphone, inconsistent perspective, camera shake, mismatched lip sync, distorted voice, off-sync audio, added dialogue, repetitive speech, jittery movement, awkward pauses, unnatural transitions, and general AI artifacts. This is important because in a longer short video, small errors become more visible across frames.
A key difference in this distilled workflow is the use of LTX2_NAG. The NAG stage helps adjust model behavior using negative audio/video guidance logic. It receives negative conditioning and applies controlled guidance parameters such as nag_scale, nag_alpha, and nag_tau. In practical terms, this gives the workflow another layer of control for reducing unwanted generation behavior, especially when working with audio-video latent structure and image-to-video consistency.
The first sampling stage uses RandomNoise, CFGGuider, KSamplerSelect, LTX scheduler logic, and SamplerCustomAdvanced. The workflow uses a low CFG-style distilled route, which is suitable for accelerated generation. Distilled workflows often do not need the same high guidance settings as normal full-step workflows. The goal is to keep the output coherent while allowing the distilled route to generate efficiently.
The workflow uses Euler-style sampling for the main generation stage, then uses a second refinement stage with ManualSigmas and gradient estimation sampling. ManualSigmas gives a compact and controlled refinement schedule. This second stage is especially useful after latent spatial upscaling because the model can refine detail, edges, and stability without regenerating the entire video from scratch.
The spatial upscaling stage is one of the most important parts of the workflow. LTXVLatentUpsampler uses the LTX-2 spatial upscaler model to enlarge the latent video before final decoding. This is different from simply resizing the final frames after generation. Latent-space upscaling can produce cleaner high-resolution output because the video representation is enhanced before it becomes pixels. This is useful for 1080p-class results and for creators who want sharper Civitai, YouTube, Bilibili, or RunningHub examples.
After latent upscaling, LTXVImgToVideoInplace can be used again to maintain or reinforce the source-image relationship inside the upscaled latent route. This helps preserve consistency after the spatial upscale stage. The second LTXVConcatAVLatent then combines the refined video latent with the audio latent, and the second SamplerCustomAdvanced stage performs the compact refinement pass.
Finally, LTXVSeparateAVLatent separates the video latent and audio latent. VAEDecode converts the video latent into image frames, while LTXVAudioVAEDecode can decode the audio latent when needed. The workflow can then output the generated video frames for assembly into the final video.
This workflow is especially useful for creators who want a faster LTX-2 I2V route without giving up too much control. It can be used for character motion, product shots, two-person scenes, cinematic social media clips, AI short video previews, image-to-video demonstrations, Civitai video examples, YouTube workflow showcases, and Bilibili tutorials. The distilled route makes iteration more practical, while the latent upscaler and refinement stage help keep the final output polished.
Main features:
- LTX-2 image-to-video workflow
- Distilled acceleration route for faster generation
- Uses LTX-2 19B Dev FP8 model backbone
- Uses LTX-2 distilled LoRA route
- Gemma 3 12B text encoder support
- Designed for 60-frame-class video consistency
- 1080p-class output route with 1920 x 1088 support
- Source image preprocessing with LTXVPreprocess
- Image-to-video latent injection with LTXVImgToVideoInplace
- LTXVConditioning for video prompt conditioning
- Detailed negative prompt for consistency and artifact control
- LTX2_NAG guidance support
- LTXVConcatAVLatent and LTXVSeparateAVLatent AV pipeline
- LTXVLatentUpsampler spatial latent upscaling
- ManualSigmas compact refinement stage
- SamplerCustomAdvanced generation and refinement
- Suitable for faster I2V testing and high-resolution output
Recommended use cases:
LTX-2 distilled image-to-video generation, 60-frame short video creation, 1080p AI video output, character animation, product motion video, cinematic image-to-video generation, portrait-to-video testing, social media video covers, YouTube AI video previews, Bilibili workflow demonstrations, Civitai video examples, RunningHub workflow publishing, fast LTX-2 I2V experiments, and consistency-focused video production.
Suggested workflow:
Start by preparing a clean source image. The input image should have a clear subject, stable lighting, good contrast, and a readable composition. If the image is too blurry, noisy, compressed, or visually confusing, the video will usually become less stable. For character videos, make sure the face, hands, clothing, and body shape are clearly visible.
Use a valid resolution. The workflow note explains that width and height must follow LTX-2 valid size behavior. The default route is safer for testing, while 1920 x 1088 can be used when the GPU is powerful enough. For early prompt tests, use a smaller resolution first. After the seed and motion are stable, switch to the 1080p-class route.
Set the frame count correctly. LTX-2 frame count should follow the divisible-by-8-plus-1 rule. If the target is a 60-frame-class clip, keep the workflow’s internal frame calculation logic active so the final frame count remains compatible. Invalid values may be silently adjusted, so it is better to control the count directly.
Write a motion-focused prompt. Do not treat the prompt like a still-image prompt. Describe what happens during the clip. Include subject action, camera behavior, lighting change, dialogue, sound cues, object interaction, and emotional tone if needed. LTX-2 responds better when the prompt describes temporal behavior.
Keep movement moderate for better consistency. If your priority is stable identity and clean 1080p output, avoid extreme camera movement, fast spinning, rapid body turns, or large background changes. Use subtle head movement, natural hand motion, slow camera push-in, gentle parallax, or small environmental movement.
Use the negative prompt to control drift. Suppress flicker, bad hands, distorted faces, unstable eyes, camera shake, background clutter, wrong text, mismatched lip sync, robotic audio, off-sync timing, and unnatural transitions. The longer the clip, the more important these controls become.
Use the distilled route for faster iteration. Because this workflow uses a distilled LoRA route and compact sampling, it is practical for testing multiple prompts and seeds. If the result is good, keep the seed and refine the prompt gradually. If the video drifts, change the seed or reduce motion complexity.
Use the latent upscaler only after the base motion is acceptable. LTXVLatentUpsampler is powerful, but it is not meant to fix bad motion. If the first generated video has identity drift or unstable camera behavior, adjust the first stage before relying on the upscaler. Upscaling unstable motion will only make the problems more visible.
Use the second refinement stage for final polish. ManualSigmas and gradient estimation sampling help refine the upscaled latent. This is useful for improving detail, edge quality, and high-resolution sharpness while preserving the established video structure.
When evaluating results, look at more than sharpness. Check character identity, lighting consistency, camera smoothness, face stability, hand stability, background continuity, and whether the video still feels like the original input image. A good result should feel like the still image naturally came alive, not like a different image was generated every frame.
For 1080p output, monitor VRAM and generation time carefully. High-resolution I2V with latent upscaling is heavier than normal text-to-image or low-resolution video generation. If the workflow becomes unstable, reduce resolution, shorten the frame count, or test with a simpler prompt.
This workflow is designed for creators who want a faster and more practical LTX-2 image-to-video consistency pipeline. It combines source-image conditioning, distilled LTX-2 acceleration, AV latent structure, NAG control, latent spatial upscaling, and second-stage refinement into one production-oriented workflow for high-quality short video generation.
🎥 YouTube Video Tutorial
Want to know what this workflow actually does and how to start fast?
This video explains what the tool is, how to launch the workflow instantly, and shares my core design logic — no local setup, no complicated environment.
Everything starts directly on RunningHub, so you can experience it in action first.
👉 YouTube Tutorial: https://youtu.be/VYBoOk7pCJA
Before you begin, I recommend watching the video thoroughly — getting the full context helps you understand the tool faster and avoid common detours.
⚙️ RunningHub Workflow
Try the workflow online right now — no installation required.
👉 Workflow: https://www.runninghub.ai/post/2019439198570287106/?inviteCode=rh-v1111
If the results meet your expectations, you can later deploy it locally for customization.
🎁 Fan Benefits: Register to get 1000 points + daily login 100 points — enjoy 4090 performance and 48 GB super power!
📺 Bilibili Updates (Mainland China & Asia-Pacific)
If you’re in the Asia-Pacific region, you can watch the video below to see the workflow demonstration and creative breakdown.
📺 Bilibili Video: https://www.bilibili.com/video/BV1wiFzzwEoR/
☕ Support Me on Ko-fi
If you find my content helpful and want to support future creations, you can buy me a coffee ☕.
Every bit of support helps me keep creating — just like a spark that can ignite a blazing flame.
👉 Ko-fi: https://ko-fi.com/aiksk
💼 Business Contact
For collaboration or inquiries, please contact aiksk95 on WeChat.
🎥 YouTube 视频教程
想了解这个工作流到底是怎样的工具,以及如何快速启动?
视频主要介绍 工具定位、快速启动方法 和 我的构筑思路。
我们会直接在 RunningHub 上进行演示,让你第一时间看到实际效果。
👉 YouTube 教程: https://youtu.be/VYBoOk7pCJA
开始前建议尽量完整地观看视频 —— 把握整体思路会更快上手,也能少走常见弯路。
⚙️ 在线体验工作流
现在就可以在线体验,无需安装。
👉 工作流: https://www.runninghub.ai/post/2019439198570287106/?inviteCode=rh-v1111
打开上方链接即可直接运行该工作流,实时查看生成效果。
如果觉得效果理想,你也可以在本地进行自定义部署。
🎁 粉丝福利: 注册即送 1000 积分,每日登录 100 积分,畅玩 4090 体验 48 G 超级性能!
📺 Bilibili 更新(中国大陆及南亚太地区)
如果你在中国大陆或南亚太地区,可以通过下方视频查看该工作流的实测效果与构思讲解。
📺 B站视频: https://www.bilibili.com/video/BV1wiFzzwEoR/
我会在 夸克网盘 持续更新模型资源:
👉 https://pan.quark.cn/s/20c6f6f8d87b
这些资源主要面向本地用户,方便进行创作与学习。

