H Company launched Holotron-12B, an open multimodal model for computer-use agents built on a hybrid SSM-attention stack that targets KV-cache bottlenecks. Benchmark it if you need high-concurrency browser agents and want better throughput without giving up web-task accuracy.

Holotron-12B is available now as an open model on Hugging Face via the model card, with a deeper product writeup in H Company's technical post. H Company describes it as a "high-throughput, open-source, multimodal model" built specifically for the "age of computer-use agents," which is a more implementation-specific claim than a general-purpose VLM launch.
The architectural hook is in the architecture note: Holotron-12B is post-trained from NVIDIA's open Nemotron-Nano-12B-v2-VL and uses a hybrid SSM-attention design to reduce the KV-cache bottleneck. H Company says that gives it the "linear scaling and high-concurrency performance" needed for online reinforcement learning and production agent workloads across browser and mobile environments.
H Company's headline numbers are unusually deployment-oriented: "8.9k tokens/s on a single H100," "over 2x faster than Holo2-8B," and a much smaller memory footprint that allows larger effective batch sizes on the same hardware. For teams serving browser or UI agents, that matters more than a generic model-quality claim because concurrency and memory pressure usually dominate cost envelopes.
The accuracy claim is also concrete. H Company says WebVoyager performance rose from 35.1% to 80.5%, suggesting the throughput work did not come with an obvious tradeoff on web-task execution. In the same thread, the company said it is an early-access partner for NVIDIA's Nemotron 3 Omni and expects that MoE foundation to push the next round of "reasoning and low-latency precision" for enterprise-scale autonomous computer-use systems enterprise deployment thread.
🚀 Live from @NVIDIAGTC, we're releasing Holotron-12B! Developed with @nvidia, it's a high-throughput, open-source, multimodal model engineered specifically for the age of computer-use agents. Get started today! 🤗Hugging Face: huggingface.co/Hcompany/Holot… 📖Technical Deep Dive: Show more
📈 Performance at Scale: - Throughput: Over 2x faster than Holo2-8B, reaching 8.9k tokens/s on a single H100. - Accuracy: WebVoyager performance surged from 35.1% to 80.5%. - Efficiency: A dramatically reduced memory footprint allows for much larger effective batch sizes on the Show more
🤝 As a member of the NVIDIA Inception Program, H Company is honored to be among the early access partners for the new Nemotron 3 Omni. By leveraging its new MoE (Mixture of Experts) foundations, we will deliver the next leap in reasoning and low-latency precision for Show more