Skip to content
AI Primer
breaking

Google DeepMind launches Kaggle benchmark contest with $200k to measure AGI capabilities

Google DeepMind and Kaggle opened a global challenge to build cognitive benchmarks across learning, metacognition, attention, executive function, and social cognition. Join if you work on evals and want reusable tasks with human baselines instead of another saturated leaderboard.

2 min read
Google DeepMind launches Kaggle benchmark contest with $200k to measure AGI capabilities
Google DeepMind launches Kaggle benchmark contest with $200k to measure AGI capabilities

TL;DR

  • Google DeepMind and Kaggle opened a global hackathon with $200,000 in prizes to build new cognitive evaluations for AI, framed as a way to measure progress toward AGI rather than add another narrow benchmark.contest thread
  • The challenge is targeting five harder-to-measure areas — learning, metacognition, attention, executive functions, and social cognition — because existing benchmarks are being "saturated" by current models, according to the launch thread.
  • DeepMind is asking the community to turn its cognitive framework into reusable benchmark tasks on Kaggle, with the official competition page serving as the entry point for submissions.launch post
  • A supporting practitioner summary says the framework maps 10 cognitive abilities and that some still lack reliable evaluations, which helps explain why labs struggle to compare systems consistently across "general intelligence" claims.framework summary

What launched

DeepMind's launch post says the company is running a global Kaggle competition to "build new cognitive evaluations for AI," with $200,000 in prizes for submitted benchmarks. The official Kaggle competition is positioned as a benchmark-building contest, not a model leaderboard, which makes this more relevant to eval engineers than to model hobbyists.

The concrete scope comes from the organizer thread, which says submissions should measure cognitive capabilities across learning, metacognition, attention, executive functions, and social cognition. The same post argues current AI systems are starting to saturate many existing tests, so the bar now is building tasks that remain discriminative as models improve.

Why this matters for eval engineers

The strongest technical signal is that DeepMind is not just asking for harder questions; it is asking for benchmarks grounded in a broader cognitive framework. A practitioner summary of the release says the framework maps 10 cognitive abilities, includes human baselines for each task, and still has "5 abilities" with no reliable evals, which points to gaps in today's benchmarking stack rather than just gaps in model scores.framework summary

That matters because many labs still publish on different eval suites, making cross-model comparisons noisy. DeepMind's announcement explicitly pitches this as a community effort to "put our framework to the test," suggesting the output they want is portable task design that other researchers and model providers can reuse, not a one-off benchmark stunt.

Share on X