OpenAI opened its first Model Craft challenge, asking participants to train the best language model that fits inside a 16 MB artifact and trains in under 10 minutes on eight H100s. Engineers get a concrete optimization target, an automated GitHub leaderboard, and a public benchmark for training-efficiency tricks.

Parameter Golf is a constrained training contest, not a general model launch. The core rule in OpenAI’s repo screenshot is unusually tight: the entire artifact must stay within 16 MB, including weights and training code, and the model must train in less than 10 minutes on 8×H100s. OpenAI’s launch post says submissions are evaluated on FineWeb validation compression, measured in bits per byte, which avoids tying the benchmark to any single tokenizer.
The challenge design is also explicit about what kinds of techniques might matter. The repo screenshot lists ideas such as “aggressive parameter tying,” “depth recurrence,” “low-rank training,” “low precision,” QAT, bitnets, “novel tokenizers,” and even “test-time training” or “megakernels.” That makes this closer to an engineering sandbox for model-efficiency tricks than a narrow architecture bake-off.
OpenAI is positioning it as part of the same family as NanoGPT Speedrun, but with a different objective. In the repo screenshot, the company describes Parameter Golf as optimizing the lowest loss under a fixed parameter budget, rather than optimizing primarily for elapsed training time or dataset size.
OpenAI’s challenge page points participants to a public repo and leaderboard, giving engineers a concrete implementation path instead of a vague call for ideas. The page summary says the repository includes baseline models, datasets, and evaluation scripts, so entrants can fork a working setup and iterate on architecture, compression, or training-system changes without rebuilding the benchmark harness from scratch.
The operational piece is also part of the launch. OpenAI engineer Will Depue wrote that the company is “covering $1M of compute” compute post and that it is “working with runpod” with a dedicated Parameter Golf template Runpod template. For practitioners, that matters because the rules depend on a specific hardware envelope, and a shared template reduces ambiguity around whether a result actually fits the contest’s 8×H100, sub-10-minute constraints.
The timebox is short. The challenge thread says the challenge is open from March 18 through April 30, with hiring language attached for top performers, which makes the public leaderboard both a benchmark and a visible screening surface.
OpenAI just released "Parameter Golf" a new challenge to train the best language model that fits in a 16MB artifact and trains in under 10 minutes on 8xH100s There's also a leaderboard. If you perform well they might hire you The challenge is open from March 18th to April 30th
Are you up for a challenge? openai.com/parameter-golf