Prompt Injection
Indirect prompt attacks, malicious context, and tool abuse.
Stories
Filter storiesOpenAI expanded Lockdown Mode from organizations to personal and self-serve Business accounts, adding an opt-in setting that limits outbound network requests. The feature is meant to block the final exfiltration step in prompt-injection attacks, though malicious instructions can still affect responses.
OpenRouter released Guardrails to apply budget limits, provider restrictions, zero-data-retention rules, prompt-injection defense, and DLP checks across routed traffic. Google Model Armor and Lakera Guard connectors are in beta, so plan around limited availability.
TimescaleDB added a read-only MCP mode, practitioners pushed credential brokering, and an OpenClaw user open-sourced a skill-quarantine review pipeline. That matters because secret handling and destructive permissions are moving out of prompts and into brokered or reviewable control layers.
OpenAI added Chronicle, a Codex preview that turns recent screen context into reusable memories for errors, files, docs, and workflows. The macOS Pro-only feature stores local memory unencrypted and can burn rate limits quickly, so watch prompt-injection risk before relying on it.
Sentinel Gateway promoted tool-scoped execution controls, Agent v0 shipped OS sandboxing plus hash-chain logs, and NeoBild published a 336-round Termux CVE loop. Use these controls to constrain agent actions and run security analysis locally.
OpenClaw's maintainer asked users to switch to the dev channel and stress normal workflows before a large release that may break plugins. Watch harness speed, context plugins, and permission boundaries closely while the SDK refactor lands.
Anthropic's Opus 4.6 system card shows indirect prompt injection attacks can still succeed 14.8% of the time over 100 attempts. Treat browsing agents and prompt secrecy as defense-in-depth problems, not solved product features.
Security coverage around OpenClaw intensified with a report on indirect prompt injection and data exfiltration risks, while KiloClaw published an independent assessment of its hosted isolation layers. Review your default configs and sandbox boundaries before exposing agents to untrusted web or tenant data.