Two community posts report Suno v5.5 and current v5 often ignore male or female section tags and specific instrument cues. Users say the misses burn credits, so try verse numbers and separate audio-first lip-sync steps as partial workarounds.

[female vocals], swaps singers mid-line, and burns through credits while they try to force a stable voice assignment.Suno's official materials push creators toward Custom Mode, richer lyrics-box prompting, and prompt reuse when a voice pass goes wrong in its editing guide. The interesting gap is that today's creator reports are not asking for vague style transfer, they're asking for basic section discipline. One thread is about verse-by-verse singer assignment, the other is about instrument placement, and both describe a model that still wanders.
The strongest practical advice in the duet thread came from a commenter who moved the instructions out of loose bracket tags and into numbered verse labels. Their format was simple:
[verse 1 female], [verse 2 male], [verse 3 male & female]According to the same thread, even that structure is still hit or miss. The user who proposed it said roughly 2 out of 6 generations still got the duet wrong.
That thread widens the problem beyond vocals. The original poster said v5.5 improved voice quality, but failed when asked for specific instruments in specific parts, even on relatively ordinary metal and jazz arrangements. In replies, one commenter said Studio had been broken for instrument generation since the update, while another said v5.5 was usable for production but less original than before.
That clashes with Suno's official pitch in the v5.5 announcement, which describes the model as more expressive and more personalized, and with Suno's older guidance in Better Prompts in Lyrics, which encourages putting more structural context directly in the lyrics box.
A separate creator post points to a cleaner production split when the singing pass is fragile: render the audio until the emotion lands, then do the lip-sync video as a second step. The post links to LTX-2.3 workflows on Hugging Face, and the author says they prefer that two-stage flow even though it could be merged into one pipeline.
That is a different toolchain, but it introduces one concrete fact the Suno complaint threads do not: some creators are already treating audio generation and performance animation as separate passes, because fixing the voice first is easier than rescuing a bad composite later.