Skip to content
AI Primer
release

Weights & Biases launches MCP server with 20 tools for schema-first queries

Weights & Biases released an MCP server that exposes experiment data to Claude Code, Cursor, Codex, Gemini CLI, and Le Chat. The schema-first design helps agents inspect available metrics before pulling rows, which can prevent preview runs from overflowing context windows.

3 min read
Weights & Biases launches MCP server with 20 tools for schema-first queries
Weights & Biases launches MCP server with 20 tools for schema-first queries

TL;DR

  • wandb's launch post says the new W&B MCP server is live with 20 tools, and that it plugs into Claude Code, Cursor, Codex, Gemini CLI, and Le Chat.
  • In wandb's schema-first thread, W&B says an early preview could dump a 300-metric run into an agent's context window in one call, which pushed the team to redesign the primitives.
  • That same thread says the server now lets an agent inspect what data exists before it pulls rows, which is the core schema-first behavior in this launch.
  • wandb's setup post includes a direct claude mcp add --transport http command, while the main announcement says the server is hosted on every W&B deployment.

You can open the full W&B write-up, grab an API key from W&B authorize, and point a client at the MCP endpoint. The interesting bit is not just that W&B shipped another MCP server, but that its preview failure case was an agent blowing its context window on experiment metrics.

20 tools, hosted across W&B deployments

wandb's announcement frames the server as an agent-native interface into experiments, training monitoring, and research loops. The same post says all 20 tools are hosted on every W&B deployment, which makes this a hosted integration layer rather than a local tool bundle.

The official write-up, linked in wandb's follow-up, is where W&B expands the product framing and use cases in more detail.

Schema-first queries

The most concrete product detail sits in wandb's second post: a preview run with 300 metrics could exhaust an agent's context window in a single call. W&B says that failure pushed it to rebuild the primitives around schemas first.

W&B's description breaks the interaction into three steps:

  • ask what objects and fields exist
  • inspect available metrics before requesting data
  • pull only the rows the agent actually needs

That is a more interesting design choice than the raw tool count, because it treats experiment metadata as something an agent has to browse before it retrieves payloads.

Setup command and client support

wandb's setup instructions publish a ready-to-run Claude command:

The same post links to W&B authorize for API keys and the MCP endpoint for transport setup. According to the main launch post, the supported client list at launch includes Claude Code, Cursor, Codex, Gemini CLI, and Le Chat.

Share on X