SAMA releases 14B instruction-guided video editing with sparse anchor frames
SAMA is a new 14B open model for instruction-guided video editing that separates semantic anchoring from motion alignment and claims state-of-the-art open results. Track it if you need edits that change objects or style without wrecking motion.

TL;DR
- SAMA is a new 14B open model for instruction-guided video editing, and the release thread says it targets the usual hard tradeoff: changing content without breaking motion.
- According to the paper post, the model splits editing into semantic anchoring and motion alignment instead of treating both as one problem.
- The launch thread says SAMA is Apache 2.0 licensed and claims state-of-the-art performance among open-source video editing models.
What shipped
SAMA is pitched as a general video editor for object replacement, addition, removal, and style transfer. The core idea is sparse anchor frames: the model predicts semantic tokens and video latents at key frames, then uses a separate motion-focused module to carry those edits through time without the usual flicker or drift.
How the workflow is different
The paper describes a two-stage training setup. First, SAMA learns motion from raw video with restoration-style pretext tasks including cube inpainting, speed perturbation, and tube shuffling. Then it is fine-tuned on paired editing data for instruction following. That separation is the creative hook: it is designed for prompts that swap subjects or restyle scenes while keeping camera movement and scene dynamics intact, which is also what the supporting writeup highlights in its demo overview.