TOPIC8 stories

Prompt Injection

Indirect prompt attacks, malicious context, and tool abuse.

Stories

OpenAI opens ChatGPT Lockdown Mode to all plans and limits outbound data exfiltration

OpenAI expanded Lockdown Mode from organizations to personal and self-serve Business accounts, adding an opt-in setting that limits outbound network requests. The feature is meant to block the final exfiltration step in prompt-injection attacks, though malicious instructions can still affect responses.

RELEASE4w ago

OpenRouter launches Guardrails with budget caps, ZDR, and prompt-injection filters

OpenRouter released Guardrails to apply budget limits, provider restrictions, zero-data-retention rules, prompt-injection defense, and DLP checks across routed traffic. Google Model Armor and Lakera Guard connectors are in beta, so plan around limited availability.

WORKFLOW1mo ago

TimescaleDB adds read-only MCP mode for agents

TimescaleDB added a read-only MCP mode, practitioners pushed credential brokering, and an OpenClaw user open-sourced a skill-quarantine review pipeline. That matters because secret handling and destructive permissions are moving out of prompts and into brokered or reviewable control layers.

RELEASE2mo ago

OpenAI Codex adds Chronicle screen memories in macOS Pro preview

OpenAI added Chronicle, a Codex preview that turns recent screen context into reusable memories for errors, files, docs, and workflows. The macOS Pro-only feature stores local memory unencrypted and can burn rate limits quickly, so watch prompt-injection risk before relying on it.

NEWS2mo ago

Sentinel Gateway adds tool-scoped execution controls for agents

Sentinel Gateway promoted tool-scoped execution controls, Agent v0 shipped OS sandboxing plus hash-chain logs, and NeoBild published a 336-round Termux CVE loop. Use these controls to constrain agent actions and run security analysis locally.

RELEASE3mo ago

OpenClaw tests plugin SDK refactor before a major release

OpenClaw's maintainer asked users to switch to the dev channel and stress normal workflows before a large release that may break plugins. Watch harness speed, context plugins, and permission boundaries closely while the SDK refactor lands.

NEWS3mo ago

Anthropic reports Opus 4.6 prompt injection still succeeds 14.8% at 100 tries

Anthropic's Opus 4.6 system card shows indirect prompt injection attacks can still succeed 14.8% of the time over 100 attempts. Treat browsing agents and prompt secrecy as defense-in-depth problems, not solved product features.

NEWS3mo ago

Research reports OpenClaw prompt-injection flaws and weak defaults

Security coverage around OpenClaw intensified with a report on indirect prompt injection and data exfiltration risks, while KiloClaw published an independent assessment of its hosted isolation layers. Review your default configs and sandbox boundaries before exposing agents to untrusted web or tenant data.