High-Risk Vulnerabilities with LLMs at Nearly Triple the Rate of Traditional Software, New Cobalt Report Finds

Vercel’s breach, as Bill Brenner covered in his post, Vercel Breach Raises Supply-Chain Risk: What Security Teams Must Do Now, is what happens when the AI “helper” becomes the attacker’s way in.

The cloud platform behind Next.js disclosed this week that intruders accessed internal systems and stole some customer data after compromising Context.ai, a third‑party AI assistant that an employee had connected to Vercel’s Google Workspace with broad OAuth permissions. What looked like harmless productivity plumbing – an AI tool allowed to read and act inside corporate email and documents – turned into a supply‑chain entry point the company couldn’t fully see until it was too late.

That’s exactly the kind of gap Cobalt is calling out in its new State of Pentesting Report 2026. The firm finds that AI and LLM applications generate a disproportionate share of high‑risk findings and are the least likely to be fixed, with shadow AI and over‑permissive integrations among the leading causes of incidents. Vercel’s experience puts a concrete face to those statistics: an AI app wired into core collaboration systems, risky by design, and sitting there long enough for someone else’s compromise to become Vercel’s problem.

The culprit? Enterprise AI adoption is outpacing the security practices designed to protect it, according to findings released today by penetration testing firm Cobalt. The gap between organizations that handle vulnerabilities well and those drowning in them has grown to 8 months of exposure.

Joe Brinkley, head of offensive security research and community at Cobalt, told CYBR.SEC.Media that legacy systems “continue to slow progress down and complicate these continuous efforts.”

The company's 2026 State of Pentesting Report, which draws on more than 16,500 penetration tests conducted across roughly 2,700 organizations over five years, alongside a survey of 450 security leaders and practitioners, paints a picture of a discipline under strain. Nowhere is that strain more visible than in AI and large language model (LLM) applications, where 32% of all pentest findings are rated high risk. That’s roughly 2.7 times the 12% rate Cobalt observes in its broader dataset.

Those high-risk AI security findings are also the least likely to get fixed. At a 38% resolution rate for high-risk AI/LLM vulnerabilities, AI applications rank dead last among all asset categories tested by Cobalt. The figure is an improvement from 21% last year, but still leaves two out of every three risky AI vulnerabilities open to exploitation.

“The poor resolution rate of AI is largely attributable to issues within LLM models themselves, which security professionals often cannot fix directly. Instead of waiting on vendors, organizations must take on the initiative through continuous pentesting to proactively enhance security," says Gunter Ollmann, chief technology officer at Cobalt. He urged organizations to adopt continuous pentesting rather than waiting for vendor fixes.

One in five organizations surveyed acknowledged experiencing an AI- or LLM-related security incident in the past year. Another 18% said they were unsure, and 19% declined to answer — leading Cobalt researchers to conclude that the true incident rate almost certainly exceeds self-reported figures. Shadow AI, cited by 44% of organizations with incidents, was the leading cause, followed by data and model poisoning and improper output handling, each at 41%.

A 25x gap between leaders and laggards

Perhaps the report's most striking finding concerns the pace of remediation across organizations. Top-performing firms resolve half of their high-risk findings within 10 days. Bottom-tier organizations take 249 days to hit the same halfway mark. That’s a 25-fold spread, which translates to roughly 8 additional months of risk exposure.

Cobalt argues this gap has little to do with resources or industry, citing software and healthcare as consistent top performers while utilities, manufacturing, and retail lag. Instead, the report attributes the divide to whether organizations run pentesting as a continuous, programmatic discipline or treat it as a periodic compliance checkbox.

The numbers support that thesis. Organizations with a programmatic offensive security program are 4.5 times more likely to resolve critical findings within 3 days than compliance-driven or ad hoc peers. For the first time in the survey's history, organizations describing their approach as programmatic (53%) outnumber those pentesting primarily to satisfy compliance (40%).

Executives and practitioners tell different stories

The report reveals a significant perception gap within security organizations. While 57% of C-suite executives say their organization consistently meets remediation service-level agreements, only 15% of the practitioners doing the work agree. Seventy-seven percent of practitioners describe meeting SLAs as a genuine struggle, a view shared by just 37% of executives.

That disconnect compounds with the data on actual performance. Across the full dataset, the median organization's mean time to resolution for high-risk findings is 39 days longer than the most lenient SLA targets most companies set for themselves. Half of the surveyed organizations aim to fix critical vulnerabilities within a week.

Confidence in AI defense is dropping as adoption accelerates

Security teams' confidence in their ability to handle AI-related threats fell 13 percentage points year over year, from 64% to 51%. Over the same period, the share of professionals calling for a "strategic pause" on AI adoption to shore up defenses rose by the same margin, reaching 61%.

That pause is unlikely. Cobalt conducted 2.4 times as many AI and LLM pentests in 2025 as in 2024, reflecting aggressive enterprise adoption. Prompt injection accounted for 37.6% of AI pentest findings, followed by insecure output handling and model denial-of-service.

Budgets are growing, but so are expectations

There is good news for security buyers. Eight in ten organizations reported growing offensive security budgets in the past year, and 97% now view pentesting as foundational to modern security programs. Customer-driven demand is also climbing: 61% of respondents say customers actively request third-party pentest reports to validate product security, up 13 points from last year.

Yet the aggregate five-year resolution rate for all high-risk findings across Cobalt's dataset is just 52%, suggesting that, while typical organizations clear 86% of recent findings, older issues continue to accumulate in the long tail. As Cobalt's researchers put it, the pentest itself is table stakes — what separates leaders from laggards is everything that happens after the report lands.

“To transition from a non-programmatic, that’s ad-hoc or compliance-driven, to a programmatic approach to pentesting, companies must shift from treating test results as static, point-in-time snapshots, like PDFs, to treating them as operational inputs within an ongoing exposure management lifecycle,” Brinkley says.

We’ll cover that in our follow-up story.