Why So Much AI-Generated Code Is Insecure (and How to Test It)

Q: How much AI-generated code contains security vulnerabilities?

Independent studies have repeatedly found that roughly 40% to 60% of AI-generated code samples contain at least one security weakness, depending on the language, task and model tested.

Q: Why does AI generate insecure code?

AI models learn from large amounts of public code that already contains vulnerabilities, and they optimise for code that works rather than code that is safe. They have no threat model, so they happily reproduce insecure patterns like unsanitised queries and hardcoded secrets.

Q: Do functional tests catch insecure AI code?

Usually not. AI-generated code often passes functional tests while remaining exploitable, because tests check that features work, not that they cannot be abused. Security testing has to be a separate step.

The number, in context

Different studies use different languages, prompts and models, so the headline figure moves around — but it consistently lands high. The takeaway is not the exact percentage; it is that you should assume a meaningful fraction of anything an AI assistant writes is exploitable until proven otherwise. At the speed teams now generate code, that fraction adds up fast.

Why it happens

It learned from insecure code

Models are trained on enormous amounts of public code — including the vast quantity of insecure examples on the internet. When you ask for a database query, the statistically likely answer is the common pattern, and the common pattern is often the vulnerable one.

It optimises for "works", not "safe"

An assistant is rewarded for producing code that runs and satisfies the prompt. Security is invisible to that objective: an unparameterised query that returns the right rows looks like success. There is no threat model in the loop unless you add one.

It is confidently wrong

The output reads clean and authoritative, which lowers the reviewer's guard. Insecure code that looks sloppy gets scrutiny; insecure code that looks polished gets merged.

It invents dependencies

Models sometimes suggest packages that are outdated, abandoned, or do not exist — a gap attackers exploit by registering the hallucinated name. Every auto-added dependency widens the supply-chain surface.

The flaws that show up most

Injection — SQL, command and others, from building queries or commands out of unsanitised input.
Broken authorization — endpoints that forget to check who is allowed to do what (IDOR, missing access control).
Hardcoded secrets — API keys and tokens written straight into the source. (See how to check if a leaked key is still live.)
Weak authentication — naive session or token handling.
Cross-site scripting — unescaped output rendered back to the page.
Supply-chain risk — vulnerable, abandoned or hallucinated packages pulled in automatically.

Why your tests will not save you

AI-generated code routinely passes its functional tests and stays exploitable — because tests verify that a feature works, not that it cannot be abused. Security has to be its own step, and it has to check the running app, not just the source.

How to actually test it

Treat every change as untrusted — review AI output the way you would a pull request from a stranger.
Test the running app, not just the code — dynamic testing catches flaws that only appear at runtime. A proof-of-exploit approach confirms which issues are genuinely exploitable instead of guessing.
Validate secrets and dependencies — check whether any leaked key still works, and whether pulled-in packages are real, maintained and free of known CVEs.
Make it continuous — new code lands daily, so the check has to run on every change, not once a quarter.

Frequently asked questions

How much AI-generated code contains security vulnerabilities?

Independent studies repeatedly find roughly 40–60% of AI-generated samples contain at least one security weakness, depending on the language, task and model tested.

Why does AI generate insecure code?

It learns from public code that already contains vulnerabilities and optimises for code that works, not code that is safe. With no threat model, it reproduces insecure patterns like unsanitised queries and hardcoded secrets.

Do functional tests catch insecure AI code?

Usually not — AI code often passes functional tests while staying exploitable, because tests check that features work, not that they cannot be abused.

Keep reading

Request early access