releaseMarch 12, 2026

Black Forest Labs claims FLUX.2 [klein] 9B adds 2x faster multi-reference editing

Black Forest Labs says FLUX.2 [klein] 9B is now up to 2x faster for multi-reference editing at the same price, with new FP8 weights for leaner local runs. Retest reference-heavy edit pipelines if speed or local deployment was a blocker.

2 min read

Black Forest Labs claims FLUX.2 [klein] 9B adds 2x faster multi-reference editing

TL;DR

Black Forest Labs says FLUX.2 [klein] 9B is now up to 2x faster for image editing, with the biggest gains showing up in multi-reference jobs, according to BFL's launch post.
The company says quality and pricing are unchanged, and existing API users get the speedup automatically as a free upgrade in BFL's follow-up.
The technical change is KV-caching: the thread says the model can skip redundant computation on reference images, so more references should mean larger speed gains.
Black Forest Labs also released FP8 quantized weights for local or self-hosted use; the release thread frames that as lower VRAM needs and faster inference for on-device deployments.

What changed

The update is narrowly focused but useful for creators who edit from multiple source images. In its announcement, Black Forest Labs says FLUX.2 [klein] 9B now runs image-editing inference up to 2x faster, especially when several reference images are involved. The company attributes that to KV-caching, which stores the reference-image computation once instead of repeating it across denoising steps.

That matters more for look-transfer, character consistency, and composite edits than for simple one-image touchups. The follow-up post also says 9B is now closer to 4B in speed, which makes the higher-quality model easier to justify when latency was the tradeoff.

What does it change for local and API workflows

For API users, the practical change is simple: Black Forest Labs says the faster 9B path is a free upgrade at the same price, with docs, playground access, and model weights linked from the release. For self-hosted setups, the new FP8 weights are the bigger addition, since BFL says they cut VRAM requirements and improve inference speed for local runs.

The documentation in the FLUX.2 overview and the model card for the 9B-KV weights position this as a better fit for iterative reference-heavy editing, not a brand-new model family. If your pipeline stalled on multi-image latency or VRAM overhead, this is the part of FLUX.2 worth retesting.