Skip to content
AI Primer
release

LTX 2.3 launches video-to-video mode with Depth control

LTX 2.3 added video-to-video restyling, and creators are using frame-derived reference images plus Depth mode to flip clips into new looks. Reddit and ComfyUI users also report Ampere INT8 runs dropping from 118.77s to 66.45s and easier batch assembly in agent pipelines.

4 min read
LTX 2.3 launches video-to-video mode with Depth control
LTX 2.3 launches video-to-video mode with Depth control

TL;DR

You can trace the restyling recipe through techhalla's thread opener and the Depth-mode step, then jump to PurzBeats' Hermes Agent helper for batch assembly. On the performance side, the Reddit benchmark thread points to a separate INT8 checkpoint on Hugging Face and a matching ComfyUI loader node.

What shipped

The useful addition is simple: LTX 2.3 can now take an existing clip and remap it into a new look instead of starting from text alone.

The thread lays out the pieces visible in the UI:

  • text-to-video generation for the source clip
  • a first-frame grab used as the style reference
  • video-to-video as the transformation pass
  • control modes including Pose, Depth, and Edges, with Depth selected in the demo techhalla's control-mode demo
  • output settings shown as LTX-2.3 Pro, 1080p, 16:9 in the example screen the interface screenshot

Depth workflow

The thread's actual recipe is more specific than "upload a clip and stylize it."

According to techhalla's step-by-step post, the workflow is:

  1. Generate the original shot in text-to-video.
  2. Export or grab the first frame.
  3. Restyle that still into multiple looks.
  4. Upload the original video plus one restyled frame into video-to-video.
  5. Re-describe the action and style in the prompt.
  6. Pick Depth as the control mode.

That frame-first trick is the interesting part. The still image locks the new art direction before video-to-video starts propagating it through motion, which is why the examples can swing from anime to papercraft to glowing fantasy creature without changing the shot design.

Hermes Agent pipeline

Creators immediately started wrapping LTX 2.3 in automation instead of treating it as a one-off UI feature.

PurzBeats' prompt asks Hermes Agent to do three concrete jobs in sequence:

  • call ComfyUI Cloud
  • generate nine LTX 2.3 text-to-video clips of animals in different biomes
  • stitch the results together with ffmpeg using one-second crossfades the prompt example

The companion project, hermes-agent-comfyui-helper on GitHub, is described by PurzBeats' follow-up as a template-consistency helper that uses a Comfy template repo as its source of truth. The same post says the setup is hardcoded to the author's local environment and still experimental.

INT8 on Ampere

The other fast-moving thread around LTX 2.3 is less about style control and more about runtime.

r/StableDiffusion

LTX 2.3 INT8 Benchmarks (2x Faster on Ampere)

2 comments

In the Reddit benchmark, ovpresentme says an INT8 loading path aimed at Ampere GPUs cut one LTX 2.3 run from 118.77 seconds to 66.45 seconds. The post also says the gain is most relevant for cards like the RTX 3080 Ti, and not especially useful for an RTX 5090.

The post links out to two implementation pieces:

One small caveat surfaced in the same Reddit thread: when a commenter asked about the undistilled model, ovpresentme's reply said version 1.1 appears to have only a distilled release, while an older 1.0 conversion came from another user. That makes the current INT8 speedup story partly a model-availability story too.

Share on X