releaseJune 1, 2026

Microsoft and NVIDIA launch RTX Spark PCs with 128GB unified memory and 1 PFLOP FP4

Microsoft and NVIDIA unveiled RTX Spark systems, including Surface Laptop Ultra and DGX-class Windows hardware, with 128GB unified memory and 1 PFLOP FP4 local AI. Day-one support from Hermes Agent, vLLM, Ollama, and Unsloth makes the launch useful for local inference and fine-tuning, not just a PC refresh.

5 min read

Microsoft and NVIDIA launch RTX Spark PCs with 128GB unified memory and 1 PFLOP FP4

TL;DR

Microsoft and NVIDIA introduced RTX Spark, a Grace Blackwell Windows PC chip with up to 6,144 CUDA cores, up to 128GB of unified memory, and 1 PFLOP FP4 AI throughput, as kimmonismus's launch post and the Windows announcement both emphasize.
The first flagship device is Surface Laptop Ultra, which Microsoft's device post describes as its first Surface laptop with full CUDA support and up to 128GB unified memory, matching the pierceboggan repost of Surface Laptop Ultra teaser.
Microsoft says Windows itself changed for this launch, with a higher GPU-accessible unified-memory ceiling and new agent security primitives, according to the Windows blog and WesRoth's OpenShell summary.
The useful part is the day-one software list: Hermes Agent, OpenClaw, vLLM, Ollama, and Unsloth all showed support or collaboration posts, via NousResearch, vllm_project, ollama, UnslothAI, and danielhanchen.
NVIDIA also stretched the same pitch upmarket with DGX Station for Windows, which nummanali's link post surfaced as a deskside GB300 system for up to 1 trillion-parameter models.

You can read NVIDIA's main launch post, Microsoft's Windows platform writeup, and the separate Surface Laptop Ultra announcement. The more revealing side documents are NVIDIA's local agents ecosystem post and vLLM's DGX Spark deployment guide, which gets specific about NVFP4 models, serving flags, and telemetry. Meanwhile the main HN thread went straight to the caveats: Windows on Arm compatibility and whether memory bandwidth, not capacity, becomes the real limiter.

RTX Spark specs

The chip is NVIDIA's first full Windows PC SoC, not another mobile GPU. NVIDIA's product page and Computex roundup both frame it as a single-package Grace CPU plus Blackwell RTX GPU for slim laptops and compact desktops.

The core numbers are simple:

Up to 20 Arm CPU cores
Up to 6,144 Blackwell CUDA cores
Up to 128GB unified memory
Up to 1 PFLOP FP4 AI throughput
Native CUDA stack on Windows

Microsoft says RTX Spark systems start shipping this fall from Surface, ASUS, Dell, HP, Lenovo, and MSI in laptops and small desktops, per the Windows launch post. That makes this less like a one-off devkit and more like a new Windows hardware tier.

Windows memory and agents

What changed on the OS side is more interesting than the slogan. In the Windows announcement, Microsoft says it raised the total system memory Windows can expose to the GPU on high-memory unified-memory machines, specifically to let larger local models fit.

The same post says Build will introduce OS-enforced identity, containment, and manageability for agents, while NVIDIA brings OpenShell to Windows on top of those primitives. NVIDIA's own agents post adds that OpenShell is the easy-to-deploy runtime package, and that Hermes Agent plus OpenClaw are integrating with it.

That is the real launch thesis: RTX Spark is being sold as agent hardware, but Microsoft is also trying to make Windows the place where those agents can run with tighter sandboxing than the usual local-script free-for-all.

Day-one agent stack

The ecosystem posts landed fast enough to make this feel like a usable stack, not a slide deck.

NousResearch said Hermes Agent is being optimized for RTX Spark.
Teknium's repost pointed to Hermes running natively on Windows.
NousResearch on skills integration said NVIDIA's official agent skills catalog is now wired into Hermes Skills Hub.
Teknium's CLI example showed the corresponding hermes skills browse and hermes skills search commands.
vllm_project linked a hands-on DGX Spark post; the full vLLM writeup says unified memory makes it practical to load larger NVFP4 models locally and calls out paged KV cache, OpenAI-compatible serving, and Prometheus telemetry as the useful pieces.
ollama posted launch-day support with NVIDIA.
UnslothAI claimed 120B-plus parameter local training on the 128GB laptop configuration, and danielhanchen added that Unsloth Studio, quantization, finetuning, and RL tooling are being brought over.

A separate practitioner post from btibor91's ThinkStation PGX setup is a good preview of what people will actually do with this class of machine: local vLLM endpoints, Open WebUI, coding agents, ComfyUI, TTS, and Telegram-connected personal agents on a 128GB unified-memory box.

DGX Station for Windows

NVIDIA also announced a much bigger sibling that shifts the same local-agent story from personal hardware to enterprise desks. The DGX Station for Windows announcement says the system uses the GB300 Grace Blackwell Ultra Desktop Superchip, pairs a Blackwell Ultra GPU with a 72-core Grace CPU over NVLink-C2C, delivers up to 20 FP4 petaflops, and exposes up to 748GB of coherent memory.

NVIDIA says that is enough for frontier models up to 1 trillion parameters on a Windows deskside machine, shipping in Q4. That turns the launch into a two-level product line: RTX Spark for 128GB local laptops and mini desktops, DGX Station for Windows for the much fatter enterprise version of the same bet.

TL;DR

RTX Spark specs

Windows memory and agents

Day-one agent stack

DGX Station for Windows

Discussion across the web