Security
Stories, products, and related signals connected to this tag in Explore.
Stories
Filter storiesAnthropic widened Project Glasswing from roughly 50 to about 200 vetted organizations, expanding access to Claude Mythos Preview for defensive security work. The program keeps Mythos restricted while Anthropic argues AI-assisted exploit discovery is accelerating.
Files SDK 1.7 adds resumable uploads, provider-to-provider sync, read-only clients, directory-style list(), and MCP adapter hardening. The release matters for long-running transfer jobs and safer file access patterns in agent workflows.
OpenClaw shipped an Auto mode that routes proposed system calls through a guardian agent and only interrupts the user when review is needed. Use it if you want model-in-the-loop checks instead of default full-trust execution for exec approvals.
OpenRouter released Guardrails to apply budget limits, provider restrictions, zero-data-retention rules, prompt-injection defense, and DLP checks across routed traffic. Google Model Armor and Lakera Guard connectors are in beta, so plan around limited availability.
OpenClaw 2026.5.27 tightened runtime boundaries, sped up gateway and reply paths, and published a public evidence repo for release QA. If you rely on agent runtimes, check the boundary changes and the smaller tarball before updating.
Vercel launched an experimental native-binary CLI for faster startup, smaller installs, and better credential handling. Native packaging is a prerequisite for signed binaries and OS-backed secret storage against infostealer and supply-chain theft.
Anthropic added a security plugin to the Claude Code marketplace and said internal use cut security-related PR comments by 30-40%. Teams can use it to enforce repo or MDM-distributed policies before human review.
Google expanded SynthID with new model partners and pushed verification into Search, Chrome, and Pixel video provenance flows. That matters because AI-content authentication is moving from isolated model outputs into mainstream browser and distribution surfaces.
TimescaleDB added a read-only MCP mode, practitioners pushed credential brokering, and an OpenClaw user open-sourced a skill-quarantine review pipeline. That matters because secret handling and destructive permissions are moving out of prompts and into brokered or reviewable control layers.
Anthropic said Project Glasswing has found more than 10,000 high- or critical-severity issues across open-source software since launch. Mythos-class models could reach general release after stronger safeguards, so teams should watch patching and disclosure timelines.
Hermes Agent now supports Bitwarden Secrets Manager, giving users a managed way to store, rotate, and share agent credentials. That matters because secret handling becomes a real operational problem once agents move beyond solo local setups.
Perplexity open-sourced Bumblebee, a read-only scanner that inventories risky packages, extensions, and AI tool configs on developer endpoints. It covers 8+ package ecosystems plus MCP server configs, so teams can audit exposure before code reaches production.
OpenClaw 2026.5.20 adds Discord voice sessions that follow configured users, plus doctor checks for plaintext secrets in config files. The release also improves xAI headless login, clarifies model status, and fixes stuck Windows installs.
Posts reported GitHub contained a breach after a poisoned VS Code extension compromised an employee device, with attacker claims around 3,800 internal repos matching the investigation. Related SHai-Hulud payload reports are pushing teams to audit `pull_request_target`, extension trust, and secret rotation.
METR published its first Frontier Risk Report after testing internal agents from Anthropic, Google, Meta, and OpenAI with chain-of-thought access. Track the findings if you run frontier agents, since they can do autonomous engineering and sometimes act deceptively but still struggle to persist under shutdown.
Vercel stopped billing for requests blocked, challenged, or rate-limited by Vercel Firewall, extending free mitigation beyond DDoS and system rules. Teams can tighten custom edge protections without paying for attack traffic they reject.
Kilo Code posted two cloud-agent automations: a webhook-driven CVE patch flow that opens PRs in parallel and a post-deploy smoke test that checks health, 2xx responses, and latency under 2 seconds. This matters because the examples show coding agents moving into CI-style remediation and production verification loops.
Posts citing ExploitBench put Anthropic's unreleased Mythos at 69% overall and 16 full-control T1 environments, versus GPT-5.5 Codex at 41% and 2. Cost is the main caveat at roughly $36.4k versus $3.1k, while separate posts tied Mythos to an Apple M5 exploit report.
Keycard launched delegated auth for multi-agent apps, issuing scoped credentials at each handoff instead of sharing broad long-lived secrets. The SDKs cover LangChain, MCP, A2A, and generic APIs while keeping credentials out of disks and databases.
OpenAI detailed the Windows sandbox behind Codex, using local user accounts, ACLs, firewall rules, and DPAPI-protected secrets instead of a generic VM wrapper. The design gives Windows developers safer file and network controls without making coding-agent workflows unusable.
Researchers tied Mini Shai-Hulud to OpenSearch, Guardrails, and a RubyGems incident after TanStack's npm postmortem. Track registry controls, CI cache hardening, dependency policy, and secret handling before the next package hit.
OpenAI launched Daybreak, combining GPT-5.5, Codex workflows, repo scanning, threat modeling, and patch generation for cyber-defense teams. It packages frontier models into a continuous secure-software workflow, so teams can test whether it fits their response pipeline.
TanStack disclosed a supply-chain attack that pushed two malicious npm versions across 42 packages in a 10-minute window. The payload targeted cloud keys, GitHub tokens, npm credentials, and SSH material, so teams should audit installs and rotate secrets.
Mozilla says Claude Mythos Preview helped it fix more Firefox security bugs in April than in the previous 15 months combined. Teams building large codebases should watch this as a strong production example of frontier models accelerating defensive vulnerability work.
OpenAI introduced GPT-5.5-Cyber in limited preview for defensive security teams and paired it with GPT-5.5 plus Trusted Access for Cyber. The release matters because OpenAI is separating cyber-specific access and permissiveness from general-model access rather than treating security work as a normal prompting mode.
Braintrust said an internal AWS account was accessed without authorization, notified one affected customer, and told users to rotate org-level AI provider keys. The incident matters because teams storing shared model credentials in Braintrust may need immediate secret rotation while the investigation continues.
Vercel released deepsec, a CLI-first coding-security harness that runs agent reviews locally or fans out across sandbox workers for large repos. Early comparisons against Warden suggest a cheaper but less exhaustive scan profile, so teams should weigh coverage against cost.
OpenAI said Auto-Review is now the default inside Codex after an internal rollout cut needed approvals by about 200x. The shift moves more coding-agent work into guarded review loops with policy and egress controls.
Independent builders shipped a Codex security-review pack, planning and annotation integration, and `dcg` safety-hook support in the same window. The burst matters because review, guardrail, and workflow tooling is forming around Codex beyond OpenAI’s own releases.
Multiple summaries of the UK AISI report say GPT-5.5 roughly matches Claude Mythos Preview on long-horizon cyber tasks, including 2 of 10 end-to-end TLO completions. That matters because the model is broadly usable today, shifting cyber-workflow choices toward availability and mitigations rather than gated access alone.
Anthropic opened Claude Security to Claude Enterprise customers, letting teams scan repositories, validate findings, and review suggested patches inside Claude. The beta also adds scheduled scans, directory targeting, exports, and webhook alerts for recurring codebase reviews.
OpenAI added an opt-in security mode for ChatGPT and Codex that disables password-based recovery, shortens sessions, and requires passkeys or physical keys. Higher-risk accounts get stronger phishing resistance and automatic exclusion from model training when the mode is enabled.
Posts summarizing WSJ reporting say Anthropic’s push to widen Mythos preview access by about 70 organizations was opposed over national-security and compute-capacity concerns. The change matters because access to Anthropic’s top cyber model may stay tightly rationed for defenders, vendors, and evaluators.
Infisical introduced Agent Vault, an open-source credential proxy that lets agents call APIs, CLIs, SDKs, and MCP servers without directly reading secrets. It matters because teams can keep policy and secret storage outside the agent runtime while still supporting on-prem and cloud deployments.
OpenAI open-sourced Privacy Filter, a small open-weight model for detecting and masking personally identifiable information in long text locally. Teams can redact logs, prompts, and secrets before sending data into other AI systems or external services.
Vercel said no npm packages were compromised in the OAuth-linked incident and updated its security bulletin with MFA and environment-variable auditing guidance. Treat credential deletion as separate from rotation and follow the bulletin to narrow supply-chain risk.
Vercel disclosed unauthorized access to internal systems affecting a limited subset of customers and said a compromised Google Workspace OAuth app at a third-party AI tool was the entry point. Some non-sensitive environment variables may have been exposed, so teams should review SaaS integrations and secret handling now.