releaseMay 25, 2026

MiniCPM5-1B launches with 17.9 AA and ~0.5GB INT4 weights

OpenBMB released MiniCPM5-1B and says the model leads Artificial Analysis' small-model index at 17.9 while fitting into roughly 0.5GB in INT4. The release matters because it targets phones, browsers, and local runtimes with a sub-2B open model.

3 min read

MiniCPM5-1B launches with 17.9 AA and ~0.5GB INT4 weights

TL;DR

OpenBMB says OpenBMB's launch post puts MiniCPM5-1B at 17.9 on Artificial Analysis' small-model index, ahead of the 16.3 score it cites for Qwen3.5-2B.
In OpenBMB's deployment thread, the team says INT4 quantization cuts the model to about 0.5 GB, small enough to run on phones, tablets, laptops, and in a browser.
According to OpenBMB's deployment thread, MiniCPM5-1B ships with runtime support across vLLM, SGLang, llama.cpp, Ollama, Hugging Face, and ArcLight, plus fine-tuning support in LLaMA-Factory and ms_swift.
OpenBMB's ForgeTrain post adds a second claim behind the release: the base model was trained with an AI-generated pretraining framework called ForgeTrain, which OpenBMB says ran 10% faster than Nvidia Megatron.

OpenBMB's launch post links straight to Hugging Face, GitHub, and ModelScope. OpenBMB's deployment thread is where the practical details show up, including browser inference and local runtime support. OpenBMB's ForgeTrain post quietly turns the story into more than a small-model launch, because OpenBMB is also pitching an automated training stack.

Artificial Analysis lead

OpenBMB's main release claim is simple: MiniCPM5-1B is the top open base model under 2B parameters on the Artificial Analysis small-model index.

The post says the model scored 17.9, above the 16.3 result it attributes to Qwen3.5-2B, and ahead of Qwen3.5-0.8B plus LFM2.5-1.2B-Thinking on knowledge, math, coding, and tool use. That makes the release less about raw parameter count and more about how much benchmark headroom OpenBMB thinks it squeezed into a 1B model.

INT4 footprint and local runtimes

The deployment pitch is the part engineers will actually bookmark. In OpenBMB's deployment thread, OpenBMB says the INT4 build is about 0.5 GB and runs natively on phones, tablets, laptops, and even inside a web browser.

OpenBMB also lists the surrounding stack as bullets rather than hand-waving:

Inference: SGLang, vLLM, llama.cpp, Ollama, Hugging Face, ArcLight.
Fine-tuning: LLaMA-Factory, ms_swift.
Packaging claim: zero-configuration browser execution.

huggingface's repost amplified another useful release detail: OpenBMB says the project is fully open source, including weights, training data, and deployment code.

ForgeTrain

The most unusual detail lives in a separate post. OpenBMB's ForgeTrain post says MiniCPM5-1B was pretrained with ForgeTrain, which it calls a fully AI-generated production pretraining framework with no human in the loop.

OpenBMB also claims ForgeTrain ran 10% faster than Nvidia Megatron. That turns the launch into two separate bets at once: a tiny open model for edge devices, and a training pipeline that OpenBMB says can automate the framework work behind it.

TL;DR

Artificial Analysis lead

INT4 footprint and local runtimes

ForgeTrain

Discussion across the web