breakingApril 5, 2026

Suno v5.5 breaks section-level vocal tags in creator tests

Two community posts report Suno v5.5 and current v5 often ignore male or female section tags and specific instrument cues. Users say the misses burn credits, so try verse numbers and separate audio-first lip-sync steps as partial workarounds.

3 min read

Suno v5.5 breaks section-level vocal tags in creator tests

TL;DR

A Suno user testing male and female section tags says v5.5 often ignores labels like [female vocals], swaps singers mid-line, and burns through credits while they try to force a stable voice assignment.
In a second complaint, another creator comparing v5.5 against current v5 says v5.5 improved raw sound quality but still missed explicit instrument instructions, while several commenters argued current v5 has also become less reliable.
Suno's own v5.5 launch post framed the release around voice fidelity, personalization, and custom models, which makes the section-level control misses in these tests feel especially sharp.
The clearest community workaround in the duet-thread replies is to number each verse and tag it directly, while a separate workflow post suggests splitting the job into audio first, then lip-sync video.

Suno's official materials push creators toward Custom Mode, richer lyrics-box prompting, and prompt reuse when a voice pass goes wrong in its editing guide. The interesting gap is that today's creator reports are not asking for vague style transfer, they're asking for basic section discipline. One thread is about verse-by-verse singer assignment, the other is about instrument placement, and both describe a model that still wanders.

Verse-number tags

The strongest practical advice in the duet thread came from a commenter who moved the instructions out of loose bracket tags and into numbered verse labels. Their format was simple:

Style box: "Female sings verse 1, Male sings verse 2, Male and Female sing verse 3"
Lyrics box: [verse 1 female], [verse 2 male], [verse 3 male & female]

According to the same thread, even that structure is still hit or miss. The user who proposed it said roughly 2 out of 6 generations still got the duet wrong.

Instrument cues drift too

That thread widens the problem beyond vocals. The original poster said v5.5 improved voice quality, but failed when asked for specific instruments in specific parts, even on relatively ordinary metal and jazz arrangements. In replies, one commenter said Studio had been broken for instrument generation since the update, while another said v5.5 was usable for production but less original than before.

That clashes with Suno's official pitch in the v5.5 announcement, which describes the model as more expressive and more personalized, and with Suno's older guidance in Better Prompts in Lyrics, which encourages putting more structural context directly in the lyrics box.

Audio first, video second

A separate creator post points to a cleaner production split when the singing pass is fragile: render the audio until the emotion lands, then do the lip-sync video as a second step. The post links to LTX-2.3 workflows on Hugging Face, and the author says they prefer that two-stage flow even though it could be merged into one pipeline.

That is a different toolchain, but it introduces one concrete fact the Suno complaint threads do not: some creators are already treating audio generation and performance animation as separate passes, because fixing the voice first is easier than rescuing a bad composite later.