Skip to content
AI Primer
breaking

Research reports OpenClaw prompt-injection flaws and weak defaults

Security coverage around OpenClaw intensified with a report on indirect prompt injection and data exfiltration risks, while KiloClaw published an independent assessment of its hosted isolation layers. Review your default configs and sandbox boundaries before exposing agents to untrusted web or tenant data.

3 min read
Research reports OpenClaw prompt-injection flaws and weak defaults
Research reports OpenClaw prompt-injection flaws and weak defaults

TL;DR

  • New security coverage around OpenClaw centers on a reported chain of weak defaults and prompt-injection risk: KiloCode cites research saying OpenClaw has "inherently weak default security configurations," while the linked report describes prompt injection and data exfiltration exposure in agent workflows research thread linked report.
  • KiloCode also says China’s CNCERT warned that attackers are using "indirect prompt injection" against OpenClaw instances, pushing the issue from a vendor argument into a broader operational security concern CNCERT warning.
  • In response, KiloClaw is positioning its hosted OpenClaw offering as a hardened alternative, with a published whitepaper that says an independent 10-day assessment covered threat modeling, code review, and "60+ adversarial tests" security whitepaper.
  • The main engineering takeaway is not that one hosted platform is definitively safe, but that agent deployments touching untrusted web content or multi-tenant data need tighter isolation, secret handling, and safer defaults than vanilla setups appear to provide weak defaults hosted isolation.

What is the new security signal around OpenClaw?

KiloCode’s posts point to a sharper claim than the usual "AI agents can be risky" warning: the issue is described as OpenClaw’s "inherently weak default security configurations," and the linked writeup says those weaknesses can enable prompt injection and data exfiltration in deployed agent environments research thread linked report. A separate post says CNCERT warned about attackers using "indirect prompt injection" against OpenClaw instances, which matters because indirect injection usually arrives through content the agent reads rather than through a direct operator prompt CNCERT warning.

That makes the practical risk boundary clear. If an OpenClaw agent can browse, ingest external text, or act across connected tools, then unsafe defaults become an implementation problem rather than a theoretical model-safety concern weak defaults.

What evidence did KiloClaw publish about isolation?

KiloClaw answered the OpenClaw warnings with a security paper and architecture post rather than just a marketing denial. The whitepaper image says an independent assessment by Andrew Storms ran for 10 days and included PASTA-based threat modeling across "30 threats across 13 assets," plus code review, live infrastructure testing, and "60+ adversarial tests" security whitepaper.

According to KiloClaw’s architecture post, its hosted design uses Firecracker microVMs and five independent layers of tenant isolation, including identity-based routing, separate application environments, and WireGuard-based network isolation. Those are still vendor-provided claims, but they are at least concrete enough for engineers to compare against their own OpenClaw deployment model, especially around sandbox boundaries, cross-tenant separation, and secret exposure paths hosted isolation.

Further reading

Discussion across the web

Where this story is being discussed, in original context.

On X· 2 threads
TL;DR1 post
What is the new security signal around OpenClaw?1 post
Share on X