releaseMarch 10, 2026

Hugging Face launches Storage Buckets for mutable checkpoints, logs, and agent traces

Hugging Face introduced Storage Buckets, a mutable S3-like repo type for checkpoints, processed data, logs, and traces that do not fit Git workflows. Use it to move overwrite-heavy or high-volume artifacts out of versioned repos without leaving the Hub.

3 min read

Hugging Face launches Storage Buckets for mutable checkpoints, logs, and agent traces

TL;DR

Hugging Face launched Storage Buckets, a new Hub repo type for “mutable, S3-like object storage” aimed at checkpoints, processed shards, logs, and agent traces that change too often for Git-style versioning, according to the launch post and Hugging Face's thread.
The practical change is workflow fit: Hugging Face says Git “falls short” for high-throughput artifacts, while Buckets add “fast writes, overwrites, directory sync” for overwrite-heavy ML pipelines on the Hub the product thread.
Under the hood, the feature is backed by Xet, which Hugging Face says deduplicates shared chunks so “successive checkpoints skip the bytes that already exist,” reducing bandwidth and storage overhead for repetitive training artifacts the thread and the blog summary.
Access is not limited to the web UI: the launch materials say Buckets can be browsed on the Hub, scripted from Python, or managed with the hf CLI, with private or public permissions and optional pre-warming to move data closer to compute regions the launch post and the blog link.

What shipped, exactly?

Storage Buckets are the first new repo type on the Hub in four years, built for artifacts that are large, mutable, and operational rather than publish-once deliverables. In the launch post, Hugging Face contrasts them with Models and Datasets repos, saying those are good for “final artifacts,” while Buckets are for the production stream of “checkpoints, optimizer states, processed shards, logs, traces” that “rarely need version control.”

The interface and access model stay close to the rest of the Hub. The launch blog says Buckets can be browsed in the Hub UI, automated from Python, or managed through the hf CLI, and the same post describes them as mutable, non-versioned storage with standard namespace permissions for user or organization scopes. That makes this less of a separate storage product than a new storage primitive inside existing Hub workflows.

Why does this matter for ML infra teams?

The main engineering value is moving high-churn artifacts out of Git-backed repos without leaving the Hub. In Hugging Face's thread, the company frames the problem bluntly: Git breaks down on the “high-throughput side of AI,” and Buckets are meant to handle “fast writes, overwrites, directory sync” for recurring training and inference-adjacent outputs. The blog link adds that this includes intermediate files from many jobs at once, which is the pattern that makes versioned repos noisy and expensive.

The storage backend is also part of the story. Hugging Face says Buckets are powered by Xet dedup, so overlapping files do not re-upload identical chunks; in the product wording, “successive checkpoints skip the bytes that already exist” the thread. The blog post also describes pre-warming, which brings data closer to compute regions to improve throughput for distributed training. Separately, a repost from Hugging Face co-founder Thomas Wolf calls this “our fastest growing recent product” and says the company is making “petabyte storage cheap and fast” Wolf quote, which underscores that this launch is tied to scale-out storage demand rather than a cosmetic Hub update.

TL;DR

What shipped, exactly?

Why does this matter for ML infra teams?

Discussion across the web