ERNIE 5.1 Preview ranks No. 4 on Search Arena and claims 6% pretraining cost
Baidu pushed ERNIE 5.1 Preview with new leaderboard claims, including No. 4 on Search Arena and No. 13 on LMArena Text. Treat the 6% pretraining cost claim cautiously until an independent technical report confirms it.

TL;DR
- Baidu's ERNIE 5.1 Preview showed up at No. 4 on Search Arena with a 1,223 score, according to arena's leaderboard post, which makes it the only Chinese model in that top-10 snapshot.
- Baidu's benchmark card, shared by testingcatalog's roundup, puts ERNIE 5.1 ahead on GPQA at 91.0 and at 99.6 on AIME26 with tools, while trailing Claude Opus 4.6 on several other rows in the same chart.
- The eye-catching efficiency claim is that ERNIE 5.1 used about 6% of the pre-training cost of comparable models, as kimmonismus's summary and PaddlePaddle's congratulatory post both repeat from Baidu's launch materials.
- That 6% number is still mostly a vendor claim. teortaxesTex's critique argues the savings may come from shrinking a larger ERNIE 5.0 base model rather than from a clean step-change in training efficiency.
You can check the live Search Arena leaderboard, inspect Baidu's own benchmark grid through testingcatalog's post, and read the most interesting part of the pitch in
, which says ERNIE 5.1 was extracted from an ERNIE 5.0 sub-model matrix with elastic depth, expert capacity, and routing sparsity.
Search Arena
The cleanest external datapoint here is Arena's own leaderboard post. arena put ERNIE-5.1 at No. 4 in Search Arena with a 1,223 score, behind Claude Opus 4.6 Search, GPT-5.5 Search, and Claude Opus 4.7, and ahead of Claude Sonnet 4.6 Search and Gemini-3.1 Pro Grounding.
That matters more than the usual vendor self-chart because it came from the benchmark operator, not from Baidu's promo thread. kimmonismus also noted a separate 1,476 score on LMArena Text, ranking No. 13 globally, but the tweet evidence here does not include a primary source link for that number.
Elastic pre-training
Baidu's main technical claim is not a new capability category. It is a model-construction story. The launch materials quoted in testingcatalog's post and shown in
describe ERNIE 5.1 as a sub-network extracted from ERNIE 5.0's "multi-dimensional elastic sub-model matrix."
The three moving parts listed in that slide are:
- Elastic depth: varying the number of active Transformer layers during training.
- Elastic width / expert capacity: varying how many experts participate in MoE routing.
- Elastic sparsity: varying top-k routing so fewer or more experts activate.
The same slide says ERNIE 5.1 cuts total parameters to about one-third of ERNIE 5.0 and activated parameters to about half, while claiming pre-training compute cost at only 6% of comparable same-scale models. PaddlePaddle framed that as a balance between capability and training efficiency rather than as a pure frontier-model push.
Benchmark mix
The benchmark card is more mixed than the headline makes it sound. In the image attached to testingcatalog's post, ERNIE 5.1 leads GPQA at 91.0 and beats the comparison set on SpreadsheetBench-Verified at 72.5 only if you ignore Claude's 82.4, which the same chart clearly does not let you do.
A quicker read of the rows:
- Strongest rows for ERNIE 5.1: GPQA 91.0, AIME26 with tools 99.6.
- Middle of the pack: DeepSearchQA 77.3 and MMLU-Pro 84.3.
- Behind on the same vendor chart: τ3-bench versus Claude Opus 4.6, SpreadsheetBench-Verified versus Claude and Gemini, plain AIME26 versus Gemini, AdvanceIF versus Gemini.
That is why teortaxesTex's earlier reaction called the report interesting rather than dominant. The chart supports a competitive model, especially for search-adjacent and tool-using tasks, but not an across-the-board lead.
The 6% caveat
The sharpest pushback in the evidence pool is about what the 6% number actually means. teortaxesTex argues the claimed gain may mostly reflect starting from a much larger 2.4T-class ERNIE 5.0 system and then stripping it down, not proving that Baidu trained an equivalently capable model from scratch at 6% of the usual compute.
That distinction matters because Baidu's own slide already says ERNIE 5.1 is derived from ERNIE 5.0 and inherits knowledge from that earlier run through once-for-all elastic training. Until Baidu publishes a fuller technical report, the strongest version of the claim is narrow: ERNIE 5.1 looks like a compressed, selectively extracted descendant of ERNIE 5.0 with a very strong Search Arena debut, not yet a fully documented new recipe for ultra-cheap frontier pre-training.