releaseJune 26, 2026

Perceptron adds video_frames to Mk1 and cuts 1080p time-to-first-token from ~42s to ~4s

Perceptron launched a video_frames input for Mk1 that accepts pre-decoded frames with timestamps instead of forcing clip re-encoding. The change matters for edge and sparse-footage pipelines because 10 minutes of 1080p video can start returning tokens roughly ten times faster.

3 min read

Perceptron adds video_frames to Mk1 and cuts 1080p time-to-first-token from ~42s to ~4s

TL;DR

Perceptron added a video_frames input to Mk1 that accepts client-decoded frames plus timestamps, according to perceptroninc's launch post.
The pitch is simple: skip clip re-encoding and server-side resampling, then let the model reason over the exact frames you selected, as perceptroninc's thread describes.
Perceptron said 10 minutes of 1080p video dropped from about 42 seconds to about 4 seconds time-to-first-token, per perceptroninc's latency claim.
The feature is aimed at sparse-footage and edge pipelines, where perceptroninc's enterprise note called out client-side sampling and keyframes, and Armen Agha's post pointed to motion-triggered cameras in manufacturing and construction.
Availability was immediate, with perceptroninc's availability post linking both the blog post and docs.

You can jump straight to the blog post and docs, and the thread adds two concrete implementation details the launch card did not: Mk1 reads per-frame timestamps from the client payload, and Perceptron is framing the interface as OpenAI-compatible perceptroninc's thread.

video_frames

Perceptron's new input path lets callers send frames they already decoded on the client, each with its own timestamp. That shifts frame selection to the application side, so the model sees the exact subset the client chose, according to the launch post and the follow-up thread.

The thread makes the tradeoff explicit:

no re-encoding frames back into a clip perceptroninc's thread
no server-side decode and resample pass perceptroninc's thread
timestamps preserved in the request perceptroninc's thread
OpenAI-compatible interface shape perceptroninc's thread

time-to-first-token

The headline number is time-to-first-token, not total job completion. Perceptron said a 10 minute 1080p input moved from roughly 42 seconds to roughly 4 seconds before the first token arrived in the benchmark claim.

That lines up with the architectural change in the thread: the slow path being removed is video repackaging and server resampling, not the model pass itself perceptroninc's explanation.

sparse footage

Perceptron positioned the feature around pipelines that were awkward for clip-first APIs:

client-side sampling at the edge perceptroninc's enterprise note
sparse keyframe streams perceptroninc's enterprise note
motion-triggered camera footage in manufacturing and construction Armen Agha's post

That is a narrower and more useful framing than generic "video understanding." The company is targeting footage that is already sparse before it ever reaches the model perceptroninc's wording.

availability

Perceptron said video_frames is live now and attached both a blog post and documentation link in the release thread perceptroninc's availability post.

A later post from Armen Agha also described the feature as live, with the same emphasis on easier integration for sparse-footage pipelines Armen Agha's post.

TL;DR

video_frames

time-to-first-token

sparse footage

availability

Discussion across the web