Hermes supports browser-native filmmaking workflows with Syncthing handoff and taste memory
A Hermes and Kimi hackathon build mapped a local filmmaking pipeline with prompt packets, browser workers, Syncthing handoff, image ranking, and taste memory. It matters because subscription-only tools can be folded into a reusable production loop, but the taste model is still early and creator-specific.

TL;DR
- QualitativAi's hackathon demo framed the system around a simple split: AI handles production labor, while the filmmaker keeps the taste calls and story decisions.
- The build routes lore work through Kimi K2.6, visual exploration through a Hermes layer, and production execution through a separate browser worker that QualitativAi's Flow Arm post says runs Google Flow and Nano Banana Pro on a dedicated device.
- Instead of pushing files through an API stack, QualitativAi's Syncthing handoff post uses Syncthing for local device-to-device transfer, while Syncthing's docs describe the tool as bidirectional sync that keeps data off the cloud.
- The most interesting loop is after generation: QualitativAi's pre-evaluation post says Gemini 3 Flash captions and ranks images before review, then QualitativAi's taste memory post stores descriptions, metadata, and feedback as multimodal retrieval memory.
- The taste model is moving, but it is still early. In QualitativAi's holdout test, the blind-set result improved from 3 of 31 to 7 of 31 after initial taste seeding.
You can skim the official Hermes Agent repo, which describes Hermes as a self-improving agent with skills and memory in local files, browse Google Flow, which Google calls its AI filmmaking tool, and read Aakash Gupta's breakdown, which centers Hermes on scheduled runs, SOUL.md, and skill rewriting. The weirdly useful part in QualitativAi's browser-native execution post is the premise that subscription-only browser tools can become production infrastructure without waiting for clean APIs.
Creative DNA
The system starts with a project vault that stores taste, tone, canon, review criteria, and source material, according to QualitativAi's Creative DNA post. Then Kimi K2.6 sits upstream as a worldbuilding layer that explores directions, characters, locations, and arcs through that stored context, per QualitativAi's Kimi K2.6 lore post.
That maps cleanly onto the Hermes Agent README, which pitches the tool as a self-improving agent that can search past conversations, persist knowledge, and switch across providers including Kimi and OpenAI endpoints. QualitativAi's principle post keeps the design line sharp: the system is there to remove production drag, not to take over authorship.
Flow Arm
The Flow Arm is the most concrete workflow idea here. It is a bounded browser worker on a separate device that executes prompts, downloads outputs, renames them, stages them, and hands them back, according to QualitativAi's Flow Arm post.
That matters because many creator tools are still sold as subscription UIs instead of APIs. QualitativAi's browser-native execution post turns that constraint into the architecture: use the browser surface itself as the execution layer. Google's own Flow help docs describe Flow as an AI filmmaking tool, and the video creation docs note that Nano Banana Pro is the default model inside Flow.
Job packets and handoff
The system preserves lineage before and after generation. Each job packet tracks prompt IDs, model targets, world state, tone, execution path, and original intent, according to QualitativAi's job packets post. When results come back, QualitativAi's returned results post says every image stays tied to that packet, so there are no loose files or mystery generations.
The handoff layer is Syncthing. QualitativAi's Syncthing handoff post says the Brain keeps the vault and full context, while the Flow Arm only sees its workspace plus a shared transfer folder. That matches Syncthing's documentation, which describes direct file exchange across devices instead of cloud upload.
Taste memory
The review loop has three steps:
- Gemini 3 Flash describes and scores images before the human sees them, according to QualitativAi's pre-evaluation post.
- The filmmaker reviews each output on score, tier, tone, fit, tags, and failure mode, while QualitativAi's human review post says the gap between machine prediction and human judgment becomes the learning signal.
- Gemini Embeddings 2 stores the image descriptions, metadata, and feedback as taste memory, which QualitativAi's taste memory post says supports retrieval by visual and semantic similarity.
That last step lines up with Google's embeddings docs, which describe gemini-embedding-2 as a multimodal embedding model that maps text, images, video, audio, and documents into a shared space. It is a neat fit for a filmmaker archive because the stored asset is not just the image, but the judgment attached to it.
Holdout test
The only number in the thread is modest, which makes it more believable. In QualitativAi's holdout test, the system's blind holdout score improved from 3 out of 31 before taste seeding to 7 out of 31 after the first round.
That is still a weak hitter by recommendation-system standards, but it is enough to turn memory into a usable interface. QualitativAi's lightboxes post says the archive can already assemble lightboxes for requests like corrupted sacred imagery or mountain sunset shots, which is the first practical payoff of the taste layer even before the ranking gets especially good.
Future arms
The closing reveal is that Flow Arm is only the first worker. QualitativAi's prompt learning post says the system already tracks prompt successes and failures and can turn repeated model-specific observations into playbook rules. Then QualitativAi's future arms post expands the same bounded-worker pattern to Midjourney, Hailuo, Dreamina, video tools, sound tools, and editorial tools.
That extension is where this stops looking like a one-off hackathon demo and starts looking like a general creator harness. QualitativAi's long-horizon goal post describes the target as compounding creative memory, where every approval, rejection, prompt, and canon decision makes the next run easier to steer.