ComfyUI users compare FLUX Fill, Klein 9B, and ControlNet box prompts for room edits
Threads outlined multi-step pipelines for turning sketches into photoreal scenes, placing furniture with red-box prompts, and swapping in a real face after generation. Prompt-only inpainting still misses precise placement, so creators are using masks, ControlNet, and cleanup passes for tighter control.

TL;DR
- In a ComfyUI thread, one creator trying to merge an iPad sketch with a real face got a concrete community recipe: describe the sketch with an LLM, drive generation with ControlNet, then do the face swap as a later pass.
- One Stable Diffusion post said
FLUX.2-klein-9Bcould usually add furniture to room photos, but not with reliable placement, while Black Forest Labs' model card positions Klein as a fast model for both generation and editing. - A parallel Stable Diffusion thread pushed the workflow away from prompt-only edits and toward explicit spatial cues, including red rectangles, dots, masks, and ControlNet-guided inpainting.
- Black Forest Labs' FLUX.1 Tools post says
FLUX.1 Fillis built for text-plus-mask inpainting, while the official Fill docs frame it as region editing and outpainting rather than precise object placement by prompt alone.
You can compare Black Forest Labs' FLUX.1 Tools announcement, the FLUX.1 Fill docs, and the FLUX.2 klein 9B model card against what users were actually troubleshooting in ComfyUI, Stable Diffusion post one, and Stable Diffusion post two. The interesting bit is how quickly the advice collapses into spatial control: canny, depth, masks, red boxes, and cleanup passes, not bigger prompt paragraphs.
Sketch-to-photo pipeline
Need help with a workflow
18 comments
The ComfyUI thread starts with a familiar creative problem: a user wants to turn a sketch of a supermarket scene into a realistic image, but keep a friend's face in the result. In the replies, the thread turns that into a four-step stack.
- Turn the sketch into a fuller scene description.
- Use the sketch as structural input for ControlNet.
- Generate the photoreal base image.
- Swap in the real face, then resize and composite it back.
That middle step lines up with the official ControlNet repo, which describes ControlNet as a way to steer diffusion models with extra conditions such as canny edges or depth maps. The workflow advice in the ComfyUI comments is basically a creator-version of that paper language.
Fill models need masks
Adding an objects to an image
1 comments
The room-editing posts are blunt about where prompt-only inpainting breaks. In the first thread, the user says FLUX.2-klein-9B often adds objects successfully, but not where they want them, while FLUX.1-Fill-dev and SDXL inpainting both produced weak results even when the target region was masked.
That gap matches the official product split. Black Forest Labs' FLUX.1 Tools post describes FLUX.1 Fill as a text-and-binary-mask model for inpainting and outpainting, and the Fill docs describe the same tool as region editing that preserves surrounding context. Good fit for replacement and cleanup, less convincing as a single-shot object placer.
Red-box placement prompts
Adding an objects to an image
1 comments
The cleanest workaround in the evidence is also the simplest. In the second Stable Diffusion thread, a commenter suggests marking the target area with a solid red rectangle or dot, then prompting for the object inside that marker and asking the model to remove the guide shape afterward.
A separate reply in the parallel thread says the same idea works in Klein with a red bounding box, and even mentions LoRAs where you paste the object into the desired location before running the edit. That fits with the FLUX.2 klein 9B model card, which pitches the model as fast enough for iterative editing. In practice, the creators here are spending that speed budget on more scaffolding, not less.