Reddit posts said v5.5 improved voice tone but still ignores gender-labeled sections, switches singers mid-part, and struggles with detailed instrument instructions. Creators are iterating on renders until the emotion fits, then generating lipsync video to work around the gaps.

Suno's own pitch centered on expression and control in the v5.5 launch post, while the Song Editor announcement promised section rewrites and replacements down to individual beats. The Reddit complaints cut straight at that promise: gender-labeled verses still drift, instrument cues still get ignored, and users are burning credits on rerolls. Meanwhile, a Hugging Face workflow pack for LTX 2.3 points to the adjacent workaround culture, where creators split audio generation and video generation into separate stages.
The clearest complaint was not that duet prompting fails completely, but that it fails inconsistently. The original post described the same two misses over and over: the wrong voice starts a labeled section, or the right voice starts and then switches away before the section ends.
The most concrete community formatting suggestion came from a reply inside that same thread, which told users to number verses in the style box and tag the lyrics with labels like [verse 1 female], [verse 2 male], and [verse 3 male & female]. Another experienced commenter in the thread said they now keep gender prompts almost entirely in the lyrics, using simple labels like [Verse - Male] and [Chorus - Female], because extra instructions in the style box seem to confuse the model.
Even that advice came with a shrug. One commenter in the duet thread said the tagged format worked better than other prompt layouts, but still got the duet wrong in about two of six generations.
That tradeoff showed up again in the second Reddit thread. The original poster said v5.5's sound quality and voices were improved, but called the model "incredibly limited" at following specific directions for instruments, styles, and composition.
Replies in the discussion split in a useful way:
That lines up awkwardly with Suno's official framing. The launch post for v5.5 describes the model as Suno's "best and most expressive" release yet, while the Song Editor post promises lyric replacement, section reworks, and beat-level editing control.
The third post was not about Suno directly, but it captured the production workaround that keeps surfacing around current AI music tools. The creator's note was blunt: get the audio right first, keep rerendering until the emotion fits, then do the lipsync video after that.
The linked LTX-2.3 workflow collection is built around ComfyUI-style video pipelines, which makes the split explicit. Audio generation and facial performance are separate problems, and creators are increasingly treating them that way when prompt-level control inside a single music model stays unreliable.