Skip to content
AI Primer
breaking

OpenAI launches Parameter Golf with 16 MB models and 8xH100 training limit

OpenAI opened its first Model Craft challenge, asking participants to train the best language model that fits inside a 16 MB artifact and trains in under 10 minutes on eight H100s. Engineers get a concrete optimization target, an automated GitHub leaderboard, and a public benchmark for training-efficiency tricks.

3 min read
OpenAI launches Parameter Golf with 16 MB models and 8xH100 training limit
OpenAI launches Parameter Golf with 16 MB models and 8xH100 training limit

TL;DR

  • OpenAI opened its first Model Craft challenge, Parameter Golf, asking entrants to train “the best language model” that fits in a 16 MB artifact and finishes training in under 10 minutes on 8×H100s, according to OpenAI’s launch post and the community summary in challenge thread.
  • The scoring target is compression on the FineWeb validation set, measured in a tokenizer-agnostic bits-per-byte metric, with OpenAI’s repo screenshot framing the contest as a parameter-constrained optimization problem rather than a standard quality benchmark.
  • OpenAI is pairing the challenge with public infrastructure: the GitHub repo hosts baselines and evaluation code, while OpenAI staff say there is a leaderboard, a Runpod starter template, and “$1M of compute” backing the event challenge thread compute post Runpod template.
  • The contest runs from March 18 to April 30, and OpenAI is explicitly treating strong submissions as recruiting signal: the launch post says standout participants may be featured publicly and could be invited to interview.

What exactly is OpenAI asking engineers to build?

Parameter Golf is a constrained training contest, not a general model launch. The core rule in OpenAI’s repo screenshot is unusually tight: the entire artifact must stay within 16 MB, including weights and training code, and the model must train in less than 10 minutes on 8×H100s. OpenAI’s launch post says submissions are evaluated on FineWeb validation compression, measured in bits per byte, which avoids tying the benchmark to any single tokenizer.

The challenge design is also explicit about what kinds of techniques might matter. The repo screenshot lists ideas such as “aggressive parameter tying,” “depth recurrence,” “low-rank training,” “low precision,” QAT, bitnets, “novel tokenizers,” and even “test-time training” or “megakernels.” That makes this closer to an engineering sandbox for model-efficiency tricks than a narrow architecture bake-off.

OpenAI is positioning it as part of the same family as NanoGPT Speedrun, but with a different objective. In the repo screenshot, the company describes Parameter Golf as optimizing the lowest loss under a fixed parameter budget, rather than optimizing primarily for elapsed training time or dataset size.

How do teams enter, and what infrastructure is available?

OpenAI’s challenge page points participants to a public repo and leaderboard, giving engineers a concrete implementation path instead of a vague call for ideas. The page summary says the repository includes baseline models, datasets, and evaluation scripts, so entrants can fork a working setup and iterate on architecture, compression, or training-system changes without rebuilding the benchmark harness from scratch.

The operational piece is also part of the launch. OpenAI engineer Will Depue wrote that the company is “covering $1M of compute” compute post and that it is “working with runpod” with a dedicated Parameter Golf template Runpod template. For practitioners, that matters because the rules depend on a specific hardware envelope, and a shared template reduces ambiguity around whether a result actually fits the contest’s 8×H100, sub-10-minute constraints.

The timebox is short. The challenge thread says the challenge is open from March 18 through April 30, with hiring language attached for top performers, which makes the public leaderboard both a benchmark and a visible screening surface.

Further reading

Discussion across the web

Where this story is being discussed, in original context.

On X· 2 threads
TL;DR1 post
How do teams enter, and what infrastructure is available?1 post
Share on X