Creates true-to-life talking avatars with synced voice and visuals.
Who it's for: creators who want this pipeline in ComfyUI without assembling nodes from scratch. Not for: one-click results with zero tuning — you still choose inputs, prompts, and settings.
Open preloaded workflow on RunComfy
Open preloaded workflow on RunComfy (browser)
Why RunComfy first
- Fewer missing-node surprises — run the graph in a managed environment before you mirror it locally.
- Quick GPU tryout — useful if your local VRAM or install time is the bottleneck.
- Matches the published JSON — the zip follows the same runnable workflow you can open on RunComfy.
When downloading for local ComfyUI makes sense — you want full control over models on disk, batch scripting, or offline runs.
How to use (local ComfyUI)
1. Load inputs (images/video/audio) in the marked loader nodes.
2. Set prompts, resolution, and seeds; start with a short test run.
3. Export from the Save / Write nodes shown in the graph.
Expectations — First run may pull large weights; cloud runs may require a free RunComfy account.
Overview
With this identity-aware workflow, you can easily create lifelike talking avatars from a single image, short audio clip, and text input. It combines facial consistency with precise lip-sync and expressive voice transfer. The model maintains the subject’s unique features while blending realistic motion and tone. Ideal for virtual personalities, digital influencers, and character-driven storytelling. This tool simplifies complex generation steps into one unified process for seamless audiovisual output.
Important nodes:
Key nodes in Comfyui LTX 2.3 ID-LoRA workflow
LoraLoaderModelOnly(#5573)Loads the LTX 2.3 ID-LoRA that preserves facial identity. Reduce its weight if you want more creative variance or increase it to lock down likeness more tightly. Pair it thoughtfully with prompt strength so identity and style do not compete. Reference: LTX‑2.3 LoRA usage on the model page. Model card
LTXVReferenceAudio(#5589)Converts your reference audio into conditioning for syllable timing, prosody, and mouth shapes. Feed clean speech for best alignment. If you hear pumping or off‑beat articulation, shorten or simplify the clip rather than boosting strength.
LTXVImgToVideoInplace(#5245, also used later)Injects the face image into the latent video stream as a spatial prior. The image‑strength control balances adherence to the photo versus motion freedom. For strong identity with natural movement, keep image strength moderate and let the ID‑LoRA carry likeness.
LTXVConditioning(#5621)Packages text conditioning and timing cues for the LTX samplers. Ensure its frame‑rate input matches your output frame rate so motion fields and phoneme timing stay coherent.
VHS_VideoCombine(#5218)Muxes frames and audio to the final file. If your audio is slightly longer than frames, enable trimming here to prevent a trailing black tail. For platform compatibility, keep the default H.264 settings unless you have a reason to change them. Node reference: ComfyUI‑VideoHelperSuite
MelBandRoFormerSampler(#5473)Separates vocals from music using a Mel‑band transformer so the generator locks to speech. If sibilants smear or plosives pop, try a different model file from the same family or reduce input loudness. Background reading: arXiv
Notes
LTX 2.3 ID-LoRA in ComfyUI | Identity-Controlled Video Creator — see RunComfy page for the latest node requirements.

