Sign In

LTX 2.3 Audio + Image to Video For Semi-Creative Slop

Download

1 variant available

Archive Other

1.53 MB

Verified:

Type

Workflows

Stats

61

Reviews

Published

Mar 31, 2026

Base Model

LTXV 2.3

Hash

AutoV2
4382CE7F54

v1 is set up for audio + image inputs

v1.1 is for just audio input (but you can toggle back to i2v easily)

One step Audio+Image to Video for LTX-2.3. Based on this great starting point from PixelMuseAI. I've made modifications to suit my particular setup and cleaned it up and locked it down for consistent output. Took me a week to realize that I don't really need the upscale stage with LTX; I'm getting awesome quality with just one stage.

So this is a single stage workflow. LTX models are pretty chunky, so I don't know how well this works for dinky GPUs. I run this on my Cray X-MP 4090 with 24GB of VRAM and 128GB of V-less RAM. With all of the default settings here (704x1080, 10 seconds @24fps) the total run time is about 2 minutes. Which is insane. If you have less horsepower you may need to do the gguffy stuff. This is what works for me. Just make sure you put in a flag to reserve some VRAM so you don't get OOM. Whatever that means. I reserve 1 GB. The margin is razor-thin, but enough to let me fly through without getting bogged down in swap purgatory.

Anyway, the point of this workflow is to generate your slop such that it matches your input audio. If the slop is sexual, I'd recommend the heretic text encoder and a nsfw LoRA. You may get a huge list of CLIP errors in your terminal if you use an abliterated encoder but you can ignore them, they should not cause any halting errors. I have put in some sliders to control the weights of the optional LoRA loader. They are very important for this process.

v1.1 addresses direct t2v with no starting image

If you want to do straight T2V with the input audio, toggle the bypass on the LTXVImgToVideoInplace node to ON. Put a dummy image into the loader, it can be anything, a tiny white square, whatever. Just set your desired size on the Resize Image v2 node. That's it. With the bypass on, the input will be ignored and you will be full T2V. You can also play with the strength of the inplace (not bypassed of course) if you want to use the input image as a rough guide. Just remember with T2V you must be much more specific and detailed with your prompt. If the output is too blurry, you can throw in the auto LTXV scheduler (the one with a step widget) to replace the manual sigmas on the sampler. Four clicks. No problem. You can do it.

There are are plenty of notes in the WF, but mostly it's plug-and-play. The duration you enter into the audio loader node is automatically multiplied by your framerate, +1, to make it conform with LTX rules. You can go much longer, and much higher resolution of course. LTX is just friggin awesome. WAN is feeling kind of dead. And MMAudio is obviated as well. Pair this up with TTS audio suite and/or Ace-Step and you're in business.

Final frame saved for non-conditioned continuation. LTX does a remarkable job even without the continuation guidance, if you're feeling lazy. But it is definitely better if you take the time to set it up.

If you're a custom node whiner, go ahead and replace what you don't have with some generic crap. But nothing in here has crazy dependencies or is hard to install.

*I have two video combines here because something is broken in my VLC install that I can't fix - anything that saves with audio has broken comfy metadata. So I only use VHS for the looping preview, and the create + save is for my output. Either one can be deleted if it is redundant for you.

Model links in the WF. The usual LTX stuff. Dev + 304 distill at 0.7 works great for me. The schedule is manually set up for it, if you want to run full, substitute some sexy sigmas.

I have found using heretic is necessary to force seriously naughty prompts through. With regular abliterated I tend to get slow motion, or slightly tamed-down adherence. Use fp8 if your main model is fp8.

https://huggingface.co/DreamFast/gemma-3-12b-it-heretic/tree/main/comfyui

My Ace-Step Workflow:

https://civitai.com/models/2351803/ace-step-for-your-ear-holes