AI Security Guardrails Are Failing Quietly, and Two New Studies Prove It

John Z Black Apr 6, 2026 AI Security #ai-security #claude-code #amazon-bedrock #guardrails #agentic-ai #prompt-injection

Do your AI guardrails actually work under production conditions? Two research teams published answers this week, and neither one is reassuring.

Adversa AI found that Claude Code’s security deny rules silently stop enforcing after roughly 50 subcommands. Unit 42 found that Amazon Bedrock’s guardrails can be bypassed when an orchestrator agent delegates to subagents. Different platforms, different failures, same result: safety tools that look great in demos and break when it matters.

Claude Code: A Known Fix That Never Shipped

When a compound shell command has more than 50 subcommands, Claude Code skips security analysis on everything past that threshold. Configure “never run rm”? Works fine on a standalone command. But if rm is subcommand number 51 in a chain, it runs unrestricted.

The reason: checking every subcommand costs tokens and froze the UI. Anthropic’s engineers capped analysis at 50 and traded security for speed. The fix already exists in their codebase. A newer parser checks deny rules correctly regardless of command length. It’s written, it’s tested, and it was never applied to the code path that ships to 500,000+ developers.

The attack is practical. Drop a malicious CLAUDE.md into a repo. A developer clones it and asks Claude to build the project. The resulting command exceeds 50 subcommands, deny rules vanish, and the attacker gets credential theft or supply chain compromise.

Bedrock: Good Defenses Nobody Turns On

Unit 42’s Bedrock research tells a different story. They found prompt injection can compromise subagents in multi-agent setups, bypassing the orchestrator’s guardrails. But here’s the key detail: when they enabled Bedrock’s built-in prompt attack Guardrail, it stopped the attacks. The problem is that most deployments don’t configure guardrails for inter-agent communication because it feels like an internal call. It’s not.

The Takeaway

Both failures share a pattern. They work in testing. They break in production. When performance pressure rises in agentic AI systems, security is what gets quietly cut.

If you’re running Claude Code, your deny rules may not be enforced past 50 subcommands. No fix is available yet. If you’re on Bedrock multi-agent, enable the prompt attack Guardrail explicitly for inter-agent flows. And for everyone building on agentic AI: stop assuming vendor guardrails are production-grade until you’ve tested them under real conditions.

See the technical details and specific mitigations for both platforms