Skip to content
AI Primer
release

SAMA releases 14B instruction-guided video editing with sparse anchor frames

SAMA is a new 14B open model for instruction-guided video editing that separates semantic anchoring from motion alignment and claims state-of-the-art open results. Track it if you need edits that change objects or style without wrecking motion.

2 min read
SAMA releases 14B instruction-guided video editing with sparse anchor frames
SAMA releases 14B instruction-guided video editing with sparse anchor frames

TL;DR

  • SAMA is a new 14B open model for instruction-guided video editing, and the release thread says it targets the usual hard tradeoff: changing content without breaking motion.
  • According to the paper post, the model splits editing into semantic anchoring and motion alignment instead of treating both as one problem.
  • The launch thread says SAMA is Apache 2.0 licensed and claims state-of-the-art performance among open-source video editing models.

What shipped

SAMA is pitched as a general video editor for object replacement, addition, removal, and style transfer. The core idea is sparse anchor frames: the model predicts semantic tokens and video latents at key frames, then uses a separate motion-focused module to carry those edits through time without the usual flicker or drift.

How the workflow is different

The paper describes a two-stage training setup. First, SAMA learns motion from raw video with restoration-style pretext tasks including cube inpainting, speed perturbation, and tube shuffling. Then it is fine-tuned on paired editing data for instruction following. That separation is the creative hook: it is designed for prompts that swap subjects or restyle scenes while keeping camera movement and scene dynamics intact, which is also what the supporting writeup highlights in its demo overview.

Further reading

Discussion across the web

Where this story is being discussed, in original context.

On X· 1 thread
How the workflow is different1 post
Share on X