Threat models, controls, and attack surfaces for agents.
HN follow-up on Stanford's jai sandbox emphasized that agent changes persist under .jai by default, with explicit mounts back into the real home directory when needed. That clarification matters for teams comparing dev containers, bubblewrap, podman, and LXC, so they can decide how much host state an agent should be allowed to keep or touch.
OpenClaw's 2026.3.28 update switched xAI integration to the Responses API and added plugin prompts that can request user permission during behavior. One user reported a /v1/models failure after upgrading with a missing operator.write scope, so teams should check auth changes before deploying.
Nicholas Carlini showed a scaffolded Claude setup that reportedly found a blind SQL injection in Ghost and repeated the pattern against the Linux kernel. The attributed demo shifts cyber-capability debate from abstract evals to disclosed software targets and 90-minute workflows, so readers should treat the result as a specific reported demo.
Stanford's `jai` package launches casual, strict, and bare Linux containment modes for AI agents, and users pair the idea with Claude Code and OpenClaw hardening tips. The workflow narrows write scope and reduces persistent exploit paths such as hooks, `.venv` files, and startup artifacts.
GitHub said Copilot Free, Pro, and Pro+ interaction data will train models by default from Apr. 24 unless users opt out, while private repo content at rest stays excluded. Teams should review per-user enforcement, enterprise coverage, and repo privacy settings before the change lands.
Compromised LiteLLM 1.82.7 and 1.82.8 wheels executed a malicious .pth file at install time to exfiltrate credentials, and PyPI quarantined the releases. Treat fresh-package installs and AI infra dependencies as supply-chain risk, and check startup hooks on affected systems.
Google DeepMind published a real-world manipulation benchmark and toolkit built from nine studies across more than 10,000 participants, with finance showing higher influence than health. Safety teams can use it to test persuasive failure modes, so add it to red-team plans for user-facing agents.
Imbue released Latchkey, a library that prepends ordinary curl calls so local agents can use SaaS and internal APIs while credentials stay on the developer machine. Try it where agents need many HTTP integrations but should not see raw secrets.
Malicious LiteLLM 1.82.7 and 1.82.8 releases executed .pth startup code to steal credentials and were quarantined after disclosure. Rotate secrets, audit transitive AI-tooling dependencies, and add package-age controls before letting agents install packages autonomously.
GitHub will start using Copilot interaction data from Free, Pro, and Pro+ tiers for model training unless users opt out, while Business and Enterprise remain excluded. Engineers should recheck privacy settings and keep personal and company repository usage boundaries explicit.
Anthropic's Opus 4.6 system card shows indirect prompt injection attacks can still succeed 14.8% of the time over 100 attempts. Treat browsing agents and prompt secrecy as defense-in-depth problems, not solved product features.
LangSmith Fleet introduces shared agents with edit and run permissions, agent identity, human approvals, and tracing. That matters because enterprise agent rollout is shifting from single-user demos to governed, auditable deployment surfaces.
Keycard released an execution-time identity layer for coding agents, issuing short-lived credentials tied to user, agent, runtime, and task. It targets the gap between noisy permission prompts and unsafe skip-permissions workflows.
OpenAI described an internal system that uses its strongest models to review almost all coding-agent traffic for misalignment and suspicious behavior. It is a sign that powerful internal agents may need continuous oversight, not just pre-deployment policy checks.
LangChain rebranded Agent Builder to Fleet and added agent identity, memory, sharing controls, and LangSmith tracing for multi-user agent operations. It gives teams a governed way to deploy Slack- and GitHub-connected agents without stitching auth and auditing together by hand.
Rivet released Secure Exec, a V8-isolate runtime for Node.js, Bun, and browsers with deny-by-default permissions and low memory overhead. Agent builders can test it against heavier sandboxes for tool execution, but should verify the isolation model before replacing container or VM controls.
Intercom detailed an internal Claude Code platform with plugin hooks, production-safe MCP tools, telemetry, and automated feedback loops that turn sessions into new skills and GitHub issues. The patterns are useful if you are standardizing coding agents across engineering, support, and product teams.
Security coverage around OpenClaw intensified with a report on indirect prompt injection and data exfiltration risks, while KiloClaw published an independent assessment of its hosted isolation layers. Review your default configs and sandbox boundaries before exposing agents to untrusted web or tenant data.
NVIDIA introduced NemoClaw, a reference stack that installs OpenShell and adds sandbox, privacy, and policy controls around OpenClaw. Use it if you want always-on agents on RTX PCs, DGX Spark, or cloud without building the security layer yourself.
OpenAI said it is acquiring Promptfoo to strengthen agent security testing and evaluation in Frontier while keeping Promptfoo open source and supporting current customers. Enterprises deploying AI agents should expect more native red-teaming and policy testing in OpenAI’s stack.
Anthropic filed two cases challenging a Pentagon-led blacklist and agency stop-use order, arguing the action retaliated against its stance on mass surveillance and autonomous weapons. Teams selling AI into government should watch the procurement and policy precedent before making long-cycle bets.