releaseMay 4, 2026

Runway launches Characters with 24fps HD video agents

Runway launched Characters, a real-time system that turns one image into a conversational HD video agent. The company says replies start in 1.75 seconds and stream above 24 fps, so live avatar workflows are moving closer to production use.

3 min read

Runway launches Characters with 24fps HD video agents

TL;DR

Runway says runwayml's launch post turns a single reference image into a conversational HD character that streams above 24 fps, and the accompanying Building post says it works across photoreal humans, cartoons, mascots, and creatures with no fine-tuning.
According to runwayml's latency post, Characters starts replying about 1.75 seconds after a user stops speaking, while the video model itself runs at an effective 37 ms per frame.
The new Characters product page frames this as an audio-driven interactive video model with live lip-sync, gestures, and longer conversations, not a one-shot talking-head generator.
Runway's API concepts docs split the system into persistent Avatars and live WebRTC Sessions, with each Session capped at five minutes.

You can watch runwayml's demo post render the whole thing as a live conversation, then open the engineering write-up for the per-frame timing math. The product is already on a public Characters page with a live demo button, while the docs show this is being packaged as an API primitive, not just a marketing demo.

Single-image animation

The core trick is broad style transfer from one image. In the March launch post, Runway said Characters can define appearance, voice, personality, knowledge, and actions from a single reference image, powered by GWM-1.

The new engineering post adds the production claim that matters more for creators: the model is generating natural lip-sync, facial expression, and head motion in real time, instead of rendering a clip offline and playing it back later.

Latency budget

Runway's write-up breaks the 24 fps claim into a few specific numbers. Each model iteration produces four pixel frames, which means the system needs one iteration roughly every 167 ms to hold 24 fps. Runway says it gets there by overlapping diffusion and VAE decoding, cutting effective model time to about 37 ms per frame in the technical post.

The same post says total server-side turnaround is about 1.75 seconds from end-of-speech to first response, including about 1,185 ms for voice agent processing and about 567 ms for the video pipeline. That is still slower than human turn-taking, but much closer to something you could actually use live on a branded site or in an interactive short.

Avatars and Sessions

The API docs show the product surface behind the demo. Runway separates persistent Avatars, which hold the image, voice, personality, knowledge base, and actions, from Sessions, which are live WebRTC conversations tied to one interaction.

That docs split also exposes a practical constraint: Sessions max out at five minutes. The product page pitches tutoring, customer support, game hosts, and companions, and its live demo plus contact-sales flow suggests Runway is aiming at deployable character experiences, not just creator-side experimentation.

The other useful missing piece is safety posture. In Runway's responsible-use post, the company explicitly calls out impersonation and real-time fraud as the central risk for interactive avatars, which tells you exactly where the company expects this format to get messy first.