OWASP Top 10 for LLMs: A Practitioner’s Implementation Guide

Sarvani Yallapragada

May 8, 2026 | 11 Lecture minute

Large Language Models (LLMs) are becoming a core part of modern applications, from copilots and chatbots to AI agents connected to tools and internal systems. As adoption grows, so do the security risks.

The OWASP Top 10 for LLM Applications (2025) highlights the most common security issues teams must address when building AI-powered systems. These risks go beyond traditional application security because LLMs interact with prompts, external data, tools, and autonomous workflows.

In this blog post, we will cover a simplified overview of the key risks and how teams can detect and prevent them.

LLM01:2025 Prompt Injection

Prompt injection is when an attacker slips malicious instructions into user input or content the model reads, tricking it into doing something it shouldn't. Direct injection is when a user directly tells the model to ignore its rules. Indirect injection is sneakier as the model reads an external document or web page that secretly contains instructions, and the model follows them without realizing it. For example, an LLM connected to internal tools retrieves a document that contains hidden instructions telling it to export database credentials. The model follows the instruction and triggers a data leak.

How to Detect It

Watch for phrases like "ignore previous instructions" or "pretend you are" in user input
Compare inputs against known malicious prompt patterns
Alert on unusual tool calls especially ones fetching or exporting data unexpectedly
Log all inputs and outputs so you can trace what happened after an incident

How to Prevent It

Make sure system-level rules can't be overridden by user messages
Sanitize and validate any external content before passing it to the model
Use clear separators between instructions and data in your prompts
Apply least-privilege access the model should only be able to call what it needs
Add output filters to block unsafe responses before they reach users

How to Test It

Run red-team tests that simulate both direct and indirect injection attempts. Use automated prompt fuzzing to probe edge cases. After any prompt changes, run regression tests to confirm your safety rules still hold.

LLM02:2025 Sensitive Information Disclosure

It happens when an LLM leaks personal data, API keys, credentials, or internal documents in its responses. It can occur through direct questions, indirect prompt injection, or a retrieval system that doesn't properly restrict access to sensitive documents. An example of this can be an internal HR assistant retrieves employee salary records during a broad query and includes them in its response even though the user asking had no right to see them.

How to Detect It

Scan model outputs for PII (names, emails, ID numbers) and secrets (API keys, passwords)
Monitor what documents the retrieval system is fetching and whether they match the user's access level
Flag responses with unusual patterns like long random strings, which could be tokens or keys

How to Prevent It

Redact sensitive data before it gets indexed or fed into the model
Only retrieve documents the current user is actually allowed to see
Add an output filter that blocks responses containing classified data
Keep sensitive data stores separate from general knowledge sources

How to Test It

Try prompting the system to extract personal records or credentials through indirect queries. Verify that restricted data can't be retrieved through similarity-based tricks. Check that access controls on your retrieval system are actually working end-to-end.

LLM03:2025 Supply Chain Vulnerabilities

LLM applications depend on many third-party components, base models, plugins, vector databases, MCP servers, and embedding providers. Any one of these can be a weak link. A malicious or compromised dependency can manipulate outputs, steal data, or take unexpected actions without realizing the source is the problem. An application uses a third-party MCP server for document processing. A malicious update modifies the server's tool responses to inject hidden instructions, causing the app to expose sensitive data.

How to Detect It

Keep a full inventory of every model, plugin, connector, and tool your application uses
Generate and maintain a Software Bill of Materials (SBOM) so you know what's inside
Watch for unexpected changes in model or tool behavior after updates
Correlate version upgrades with any new security anomalies

How to Prevent It

Vet vendors before integrating their tools check their security practices and update history
Verify model weights and tool packages using checksums and cryptographic signing
Give third-party tools the minimum permissions they need, nothing more
Isolate external services in controlled network segments where possible

How to Test It

Regularly scan dependencies for known vulnerabilities. Test that third-party tools behave exactly as documented with no hidden inputs and no unexpected outputs. Before upgrading a dependency in production, simulate the upgrade in a test environment first.

LLM04:2025 Data and Model Poisoning

Data poisoning happens when malicious data is introduced into training datasets or the retrieval corpus. In fine-tuning, poisoned samples can embed hidden behaviors that activate on specific triggers. In RAG systems, an attacker can insert crafted documents into the vector store so the model retrieves and trusts corrupted context. A RAG system indexes public documentation. An attacker adds a document with hidden instructions that changes how the model responds whenever a specific keyword is used.

How to Detect It

Track where every piece of data comes from before it enters your pipeline
Look for documents that appear in retrieval results far more often than you'd expect
Monitor for sudden shifts in model behavior after a dataset update
Check embeddings for outliers that don't fit the rest of your corpus

How to Prevent It

Control who can write to your vector store and don't allow open ingestion
Require human review for any high-impact data before it's added
Version your datasets so you can roll back if something goes wrong
Don't automatically ingest content from untrusted external sources

How to Test It

Use canary data known triggers to check whether the model has been altered. Compare model behavior before and after dataset updates. Periodically audit your retrieval corpus for documents that don't belong.

LLM05:2025 Improper Output Handling

Output risk occurs when LLM responses are used directly rendered as HTML, inserted into SQL queries, or passed to shell commands without any validation. Because model output is probabilistic, it can contain unexpected characters or code-like content. Treating it as trusted input is the mistake.

How to Detect It

Scan model outputs for suspicious patterns: script tags, SQL special characters, shell operators
Watch downstream systems for unexpected queries or commands
Enable Content Security Policy (CSP) violation reporting to catch injected scripts

How to Prevent It

Always encode output before rendering it treat it the same way you'd treat user-submitted content
Never pass model output directly to a shell command, SQL query, or code evaluator
Use parameterized queries instead of string concatenation
Validate outputs against a strict schema, for example, require JSON with defined fields

How to Test It

Deliberately include injection payloads in model responses during testing and verify they are neutralized before rendering. Review all code paths where LLM output flows into execution layers or sensitive APIs.

LLM06:2025 Excessive Agency

When an LLM agent is given too much autonomy access to APIs, databases, infrastructure without proper guardrails, it can chain together actions that were never intended. It can cause real damage: deleted records, unexpected transactions, or service disruptions, often triggered by an ambiguous instruction or injected prompt.

How to Detect It

Log every action the agent takes, including its reasoning steps
Alert when an agent exceeds a set number of actions in a sequence
Track cross-system changes that could indicate the agent acted beyond its scope

How to Prevent It

Require human approval before the agent takes any high-risk or irreversible action
Limit how many steps an agent can chain together
Give agents time-limited credentials with the minimum permissions needed
Keep planning and execution separate, don't let the model decide and act in one step

How to Test It

Test agents against adversarial and ambiguous prompts to identify how they behave. Verify that kill switches actually stop an agent mid-task. Run stress tests to observe what happens when objectives conflict.

LLM07:2025 System Prompt Leakage

The system prompt often contains safety rules, tool schemas, internal logic, and operational details that were never meant to be visible. If an attacker can get the model to reveal this content, they learn exactly how to bypass your controls. This can occur when a user repeatedly asks the model to repeat its hidden instructions. After several attempts, the model partially reveals the safety rules embedded in its system message.

How to Detect It

Watch for responses that look like internal instructions or policy text
Flag repeated meta-questions like "what are your instructions" or "ignore your rules"
Use automated red-teaming tools to simulate extraction attempts

How to Prevent It

Don't store credentials, API endpoints, or secrets inside the system prompt
Use output filters that block responses referencing hidden instructions
Keep policy logic separate from natural language instructions
Structure prompts so system rules cannot be disclosed in response to user requests

How to Test It

Run structured extraction prompts specifically designed to coerce the model into revealing system content. After every prompt update, re-test to confirm that nothing new has leaked. Rotate system prompts if exposure is confirmed.

LLM08:2025 Vector and Embedding Weaknesses

RAG systems rely on vector similarity to retrieve relevant documents. Attackers can craft documents with embeddings specifically designed to dominate retrieval results, hijacking the context the model receives. Poorly secured vector stores can also expose source content through embedding inversion, where attackers attempt to reconstruct original content from stored embeddings. For example, a malicious document inserted into a public knowledge base can be embedded to closely match frequent queries, causing it to be consistently retrieved and influence the model’s output.

How to Detect It

Monitor for documents appearing far more often than expected across unrelated queries
Check for sudden shifts in the distribution of your embedding space
Audit who can write to your vector store and when changes were made

How to Prevent It

Restrict write access to the vector store require authentication for all ingestion
Combine semantic similarity with keyword or rule-based filtering as a second check
Encrypt embeddings at rest and isolate vector infrastructure
Periodically re-index and validate your corpus to catch tampered documents

How to Test It

Simulate retrieval hijacking by inserting adversarial documents and checking whether they surface. Compare retrieval results from a clean corpus against your live one. Audit ingestion logs to see when and what was added.

LLM09:2025 Misinformation

LLMs can confidently generate content that is factually wrong with fabricated statistics, non-existent citations, and outdated information. In applications used for decision-making, legal work, or reporting, this can cause serious real-world harm.

How to Detect It

Cross-check claims against trusted knowledge sources or retrieval results
Flag responses that make factual claims without citations in high-stakes domains
Monitor for contradictions across multi-turn conversations

How to Prevent It

Ground responses in retrieved, verifiable sources rather than relying on the model's memory
Require citations for any regulated or high-stakes use case
Add confidence indicators so users know when the model is less certain
Require human review before allowing the model to publish in high-impact contexts; do not permit autonomous publishing.

How to Test It

Run benchmark evaluations using fact-sensitive datasets. Test with adversarial prompts designed to produce hallucinated references and measure how often they appear. Put corrections in place and notify affected parties if fabricated content has already been published.

LLM10:2025 Unbounded Consumption

Without limits, LLM interactions can spiral into excessive token usage, recursive agent loops, or rapid API call chains. The result is infrastructure strain, massive cost overruns, or denial of service sometimes triggered accidentally, sometimes by a malicious user probing for weaknesses.

How to Detect It

Track token usage per session and per user against expected baselines
Alert on recursive tool calls or unusually deep action chains
Use cost anomaly detection on your API and compute bills

How to Prevent It

Set hard token limits and cap response lengths
Apply rate limiting per user, per tenant, or per session
Limit how deep an agent can chain actions
Require confirmation before the model starts a high-cost operation

How to Test It

Simulate recursive prompts and measure whether your safeguards kick in. Test rate limiting and quota enforcement under high concurrency. After any incident, audit usage logs to understand the financial and operational impact.

Conclusion

LLM security is an engineering discipline, not an afterthought. The OWASP Top 10 for LLM Applications highlights that securing AI systems requires more than traditional application security practices. Teams must also address risks related to prompts, training data, external dependencies, and autonomous agents. Building secure LLM systems requires layered protections, careful data management, strong observability, and continuous testing.

To help you get started, the table below summarizes common LLM security risks and the core controls used to detect, prevent, and respond to them. It’s meant to serve as a quick-reference checklist for teams designing, deploying, or operating LLM-enabled systems:

Image - OWASP Top 10 for LLMs: A Practitioner’s Implementation Guide

Understanding these risks is the first step. For edge cases and complex deployments, consider working with security experts who specialise in AI systems. If you find this blog post useful or have real-world experiences to share, feel free to connect with me on LinkedIn.

Perspectives technologiques

Sécurité

Dernières réflexions

Explorez nos articles de blog et laissez-vous inspirer par les leaders d'opinion de nos entreprises.

Voir tout

Perspectives technologiques

OWASP Top 10 for LLMs: A Practitioner’s Implementation Guide

Explore the OWASP Top 10 for LLMs and learn how to identify, prioritize, and mitigate key AI risks including prompt injection, data leakage and tool misuse.

Thumbnail - From “Tell Developers What To Build” To “Co‑Create With Machines”

Développement de logiciels

De "dire aux développeurs ce qu'ils doivent construire" à "co-créer avec les machines".

L'IA générative est en train de réécrire discrètement ce que signifie être "du côté commercial" d'une équipe de produit.

thumbnail - Closing the SAP Data Gap: How CHROs Turn People Data into Informed Action

Données

Combler le fossé des données SAP : comment les dirigeants d'entreprise transforment les données sur les personnes en actions éclairées

Faites le lien entre SuccessFactors et S/4HANA pour aligner le CHRO et le CFO sur les coûts réels de la main d'œuvre.