A launch thread says Qwen3.5-Omni can turn whiteboards and gameplay videos into code, generate timestamped audiovisual logs, and run realtime voice features with interruption handling. The linked materials also cite 256K context, long audio and video windows, and API access for offline and realtime modes.

Qwen3.5-Omni is being pitched as a multimodal production model, not just a chatbot. In the launch thread, Hasan Toor says it can infer code directly from visual and audiovisual input, with demos framed around whiteboard-to-website and gameplay-video-to-code workflows launch demo. The linked release materials also point readers to a browser version on Qwen Chat, a report page, and separate offline and realtime API access points via Qwen Chat, offline API, and realtime API.
The technical claims are broad. According to the thread's specs post, Qwen3.5-Omni has a 256K context window, supports up to 10 hours of audio and 1 hour of video in one pass, recognizes 74 languages, and generates 29. Those numbers matter most for long-form creative review: recorded workshops, edit sessions, reference reels, and extended voice interactions fit the way creators actually work.
The clearest creator-facing workflow change is logging and breakdown. The captioning demo claims Qwen3.5-Omni can output frame-accurate timestamps, slice scenes automatically, map characters to audio, and follow custom instructions, turning raw footage into something closer to an edit log or script draft than a plain summary captioning demo. For filmmakers, documentarians, and social teams, that suggests less manual note-taking between ingest and first cut.
The realtime features push in a different direction: live performance and voice interfaces. The thread claims adjustable emotion, speed, and volume, one-sample voice cloning, web search, tool calling, and “semantic interruption” for more natural turn-taking realtime demo. If those demos hold up outside the launch thread, Qwen3.5-Omni is less interesting as a general assistant than as a multimodal layer for prototyping narrated apps, voiced characters, and hands-free creative tools.
🚨 BREAKING: Qwen3.5-Omni just dropped a mind-blowing emergent ability: Audio-Visual Vibe Coding No specific training. Just raw power. → Turn whiteboard brainstorming videos into fully functional websites → Turn gameplay screen recordings into playable code Vibe Coding just Show more
Generic video summaries are dead. Qwen3.5-Omni delivers script-level audio-visual captioning fully customizable: Frame-accurate timestamps Automatic scene slicing Character + audio mapping Follows your exact instructions Raw footage → production-ready logs in seconds. Show more
Real-time Mode feels scary human: Instantly control volume, speed, emotion Voice cloning from just one sample Smart semantic interruption knows when you’re thinking vs. background noise Built-in web search + complex tool calling Natural turn-taking. Real conversation rhythm. Show more