releaseMay 4, 2026

deepsec launches CLI-first security harness with sandbox fanout for large repos

Vercel released deepsec, a CLI-first coding-security harness that runs agent reviews locally or fans out across sandbox workers for large repos. Early comparisons against Warden suggest a cheaper but less exhaustive scan profile, so teams should weigh coverage against cost.

5 min read

deepsec launches CLI-first security harness with sandbox fanout for large repos

TL;DR

Vercel's launch post and vercel_dev's launch tweet both frame deepsec as an open source, CLI-first security harness for codebase review by coding agents.
According to cramforce's thread, deepsec runs on a laptop with an existing Claude or Codex subscription, while the GitHub repo adds that larger jobs can fan out across worker machines.
Vercel's architecture writeup says scans move through regex candidate selection, agent investigation, revalidation, enrichment, and export, with a reported 10 to 20 percent false-positive rate after revalidation.
The repo README exposes a fuller command surface than the launch tweet, including triage, metrics, report, and sandbox, while rauchg's post says internal runs already scale to 1,000-plus concurrent sandboxes.
Early hands-on notes from zeeg's first comparison and zeeg's follow-up suggest deepsec is cheaper on token inference than Warden, but less exhaustive on domain-specific issues.

You can browse the repo, skim the architecture doc, and check the models doc. The interesting buried detail is that deepsec is not a single scan command so much as a resumable pipeline with on-disk state, optional revalidation, and a separate cheap triage pass. The other useful reveal is in the security model: when the job runs in Vercel Sandbox, the repo is tarballed without .git, API keys are injected outside the sandbox, and worker egress is limited to coding-agent hosts, per the README.

CLI workflow

The happy path starts with npx deepsec init at the repo root, which creates a .deepsec directory, installs the package, and asks an agent to fill in a short INFO.md file with project-specific context from the codebase, according to the README.

From there, the workflow breaks into discrete stages instead of one long opaque run:

scan: regex-only pass to find security-sensitive files, per Vercel's blog post
process: full AI investigation that emits findings and recommendations, per the README
revalidate: second-pass agent check to cut false positives, per Vercel's blog post
enrich: attach git committer info and optional ownership metadata, per the architecture doc
export and report: render markdown and JSON outputs for tickets or review, per the README
metrics: aggregate severities, vuln types, and true-positive counts across projects, per the README

That staged design matters because each command is idempotent and writes to a consistent on-disk data model, so interrupted jobs resume by merging new state instead of starting over, according to the architecture doc.

Sandbox fanout

Single-machine scans can take days on large repos, according to Vercel's launch post. Deepsec's answer is sandbox process, which ships the working tree to Vercel Sandbox microVMs and spreads batches across remote workers.

The public materials line up on the scale claim. cramforce's thread says internal runs have used 1,000-plus cores, while rauchg's post says Vercel Sandbox can harness thousands of agents in parallel.

The repo also spells out the trust boundary more clearly than the tweets:

the local working tree is tarballed and uploaded, with .git excluded, per the README
coding-agent API keys are injected outside the sandbox, so workers cannot exfiltrate them, per the README
worker-network egress is limited to coding-agent hosts after bootstrap, per the README

That makes deepsec read less like a scanner binary and more like an orchestrator around privileged agent runs.

Models and scan profile

Under the hood, deepsec uses two agent backends: Claude Agent SDK with claude-opus-4-7 for process and revalidate, and Codex with gpt-5.5 for the same steps, according to the models doc. The launch post adds the effort settings: Opus 4.7 at max effort, GPT 5.5 at xhigh reasoning.

The repo defaults routing through Vercel AI Gateway, so one key can cover both Claude and Codex, but the config can also point straight at Anthropic or OpenAI endpoints, per the models doc. For local use, both the blog post and cramforce's thread say existing subscriptions are enough.

Vercel is also explicit that this is an expensive, application-oriented scan profile. The README says large scans can cost thousands or even tens of thousands of dollars, the launch post says deepsec works best on apps and services rather than libraries, and the same post reports a 10 to 20 percent false-positive rate even after the extra revalidate pass.

One extra implementation detail sits in the docs rather than the announcement: deepsec ships a refusal classifier so the pipeline can detect when a model declines a security task, and Vercel says refusals were a non-issue with Opus 4.7 and GPT 5.5 for its prompt setup, per the launch post.

Warden comparisons

The first public comparison came from zeeg's first comparison, who extracted the prompt, ran it through Warden, and said the prompt alone still surfaced a couple of findings their synthesized Warden skill had missed. The same post also says it missed some domain-specific concerns.

The follow-up sharpened the tradeoff. In zeeg's follow-up, zeeg wrote that deepsec would likely cost less on token inference than Warden, but be less exhaustive. By zeeg's later note, the early verdict was that Warden produced fewer findings overall, but its high-severity backend hits were more concentrated and actionable.

That lines up with how Vercel positioned the tool. Deepsec starts with broad regex candidate generation, then spends expensive reasoning budget on investigation and revalidation, which is a different shape from a narrower harness tuned for depth in a specific domain, according to Vercel's workflow description.

TL;DR

CLI workflow

Sandbox fanout

Models and scan profile

Warden comparisons

Discussion across the web