Skip to content
AI Primer
release

Microsoft and NVIDIA launch RTX Spark PCs with 128GB unified memory and 1 PFLOP FP4

Microsoft and NVIDIA unveiled RTX Spark systems, including Surface Laptop Ultra and DGX-class Windows hardware, with 128GB unified memory and 1 PFLOP FP4 local AI. Day-one support from Hermes Agent, vLLM, Ollama, and Unsloth makes the launch useful for local inference and fine-tuning, not just a PC refresh.

5 min read
Microsoft and NVIDIA launch RTX Spark PCs with 128GB unified memory and 1 PFLOP FP4
Microsoft and NVIDIA launch RTX Spark PCs with 128GB unified memory and 1 PFLOP FP4

TL;DR

You can read NVIDIA's main launch post, Microsoft's Windows platform writeup, and the separate Surface Laptop Ultra announcement. The more revealing side documents are NVIDIA's local agents ecosystem post and vLLM's DGX Spark deployment guide, which gets specific about NVFP4 models, serving flags, and telemetry. Meanwhile the main HN thread went straight to the caveats: Windows on Arm compatibility and whether memory bandwidth, not capacity, becomes the real limiter.

RTX Spark specs

The chip is NVIDIA's first full Windows PC SoC, not another mobile GPU. NVIDIA's product page and Computex roundup both frame it as a single-package Grace CPU plus Blackwell RTX GPU for slim laptops and compact desktops.

The core numbers are simple:

  • Up to 20 Arm CPU cores
  • Up to 6,144 Blackwell CUDA cores
  • Up to 128GB unified memory
  • Up to 1 PFLOP FP4 AI throughput
  • Native CUDA stack on Windows

Microsoft says RTX Spark systems start shipping this fall from Surface, ASUS, Dell, HP, Lenovo, and MSI in laptops and small desktops, per the Windows launch post. That makes this less like a one-off devkit and more like a new Windows hardware tier.

Windows memory and agents

What changed on the OS side is more interesting than the slogan. In the Windows announcement, Microsoft says it raised the total system memory Windows can expose to the GPU on high-memory unified-memory machines, specifically to let larger local models fit.

The same post says Build will introduce OS-enforced identity, containment, and manageability for agents, while NVIDIA brings OpenShell to Windows on top of those primitives. NVIDIA's own agents post adds that OpenShell is the easy-to-deploy runtime package, and that Hermes Agent plus OpenClaw are integrating with it.

That is the real launch thesis: RTX Spark is being sold as agent hardware, but Microsoft is also trying to make Windows the place where those agents can run with tighter sandboxing than the usual local-script free-for-all.

Day-one agent stack

The ecosystem posts landed fast enough to make this feel like a usable stack, not a slide deck.

  • NousResearch said Hermes Agent is being optimized for RTX Spark.
  • Teknium's repost pointed to Hermes running natively on Windows.
  • NousResearch on skills integration said NVIDIA's official agent skills catalog is now wired into Hermes Skills Hub.
  • Teknium's CLI example showed the corresponding hermes skills browse and hermes skills search commands.
  • vllm_project linked a hands-on DGX Spark post; the full vLLM writeup says unified memory makes it practical to load larger NVFP4 models locally and calls out paged KV cache, OpenAI-compatible serving, and Prometheus telemetry as the useful pieces.
  • ollama posted launch-day support with NVIDIA.
  • UnslothAI claimed 120B-plus parameter local training on the 128GB laptop configuration, and danielhanchen added that Unsloth Studio, quantization, finetuning, and RL tooling are being brought over.

A separate practitioner post from btibor91's ThinkStation PGX setup is a good preview of what people will actually do with this class of machine: local vLLM endpoints, Open WebUI, coding agents, ComfyUI, TTS, and Telegram-connected personal agents on a 128GB unified-memory box.

DGX Station for Windows

NVIDIA also announced a much bigger sibling that shifts the same local-agent story from personal hardware to enterprise desks. The DGX Station for Windows announcement says the system uses the GB300 Grace Blackwell Ultra Desktop Superchip, pairs a Blackwell Ultra GPU with a 72-core Grace CPU over NVLink-C2C, delivers up to 20 FP4 petaflops, and exposes up to 748GB of coherent memory.

NVIDIA says that is enough for frontier models up to 1 trillion parameters on a Windows deskside machine, shipping in Q4. That turns the launch into a two-level product line: RTX Spark for 128GB local laptops and mini desktops, DGX Station for Windows for the much fatter enterprise version of the same bet.

Further reading

Discussion across the web

Where this story is being discussed, in original context.

On X· 2 threads
TL;DR4 posts
Day-one agent stack7 posts
Share on X