SAMA is a new 14B open model for instruction-guided video editing that separates semantic anchoring from motion alignment and claims state-of-the-art open results. Track it if you need edits that change objects or style without wrecking motion.

SAMA is pitched as a general video editor for object replacement, addition, removal, and style transfer. The core idea is sparse anchor frames: the model predicts semantic tokens and video latents at key frames, then uses a separate motion-focused module to carry those edits through time without the usual flicker or drift.
The paper describes a two-stage training setup. First, SAMA learns motion from raw video with restoration-style pretext tasks including cube inpainting, speed perturbation, and tube shuffling. Then it is fine-tuned on paired editing data for instruction following. That separation is the creative hook: it is designed for prompts that swap subjects or restyle scenes while keeping camera movement and scene dynamics intact, which is also what the supporting writeup highlights in its demo overview.
Baidu, Tsinghua, and Zhejiang University release SAMA. Factorized semantic anchoring and motion alignment for instruction-guided video editing. Balances semantic modification and motion preservation through sparse anchor frames and motion-centric pretext tasks. Two-stageΒ Show more
SAMA Factorized Semantic Anchoring and Motion Alignment for Instruction-Guided Video Editing paper: huggingface.co/papers/2603.19β¦