H Company releases Holotron-12B: 8.9k tok/s on H100 and 80.5% WebVoyager
H Company launched Holotron-12B, an open multimodal model for computer-use agents built on a hybrid SSM-attention stack that targets KV-cache bottlenecks. Benchmark it if you need high-concurrency browser agents and want better throughput without giving up web-task accuracy.

TL;DR
- H Company launched Holotron-12B launch, an open multimodal model built with NVIDIA for "computer-use agents," and says it is tuned for web, Android, and mobile interaction workloads rather than generic vision-language chat.
- The company says Holotron-12B is post-trained from Nemotron-Nano-12B-v2-VL and uses a hybrid SSM-attention stack that targets the "KV Cache" bottleneck for higher concurrency architecture details.
- On H Company's reported benchmarks, the model reaches 8.9k tokens/s on a single H100, runs at "over 2x" the throughput of Holo2-8B, and improves WebVoyager from 35.1% to 80.5% performance claims.
- H Company also said in the partner update that it has early access to NVIDIA's Nemotron 3 Omni and plans to use its MoE base for future low-latency enterprise agent deployments.
What shipped for agent builders?
Holotron-12B is available now as an open model on Hugging Face via the model card, with a deeper product writeup in H Company's technical post. H Company describes it as a "high-throughput, open-source, multimodal model" built specifically for the "age of computer-use agents," which is a more implementation-specific claim than a general-purpose VLM launch.
The architectural hook is in the architecture note: Holotron-12B is post-trained from NVIDIA's open Nemotron-Nano-12B-v2-VL and uses a hybrid SSM-attention design to reduce the KV-cache bottleneck. H Company says that gives it the "linear scaling and high-concurrency performance" needed for online reinforcement learning and production agent workloads across browser and mobile environments.
How strong are the reported speed and accuracy gains?
H Company's headline numbers are unusually deployment-oriented: "8.9k tokens/s on a single H100," "over 2x faster than Holo2-8B," and a much smaller memory footprint that allows larger effective batch sizes on the same hardware. For teams serving browser or UI agents, that matters more than a generic model-quality claim because concurrency and memory pressure usually dominate cost envelopes.
The accuracy claim is also concrete. H Company says WebVoyager performance rose from 35.1% to 80.5%, suggesting the throughput work did not come with an obvious tradeoff on web-task execution. In the same thread, the company said it is an early-access partner for NVIDIA's Nemotron 3 Omni and expects that MoE foundation to push the next round of "reasoning and low-latency precision" for enterprise-scale autonomous computer-use systems enterprise deployment thread.