Current Buzz Spot

OWASP Top 10 Risk & Mitigations for LLMs and Gen AI Apps 2025

By Likhil Chekuri

OWASP Top 10 Risk & Mitigations for LLMs and Gen AI Apps 2025

The rapid advancement of AI, particularly in large language models (LLMs), has led to transformative capabilities in numerous industries. However, with great power comes significant security challenges. The OWASP Top 10 for LLMs (2025) aims to address these evolving threats. This article explores what's new, what's changed, and what businesses need to prioritize to secure their AI systems.

Although these changes were finalized in late 2024, OWASP Core Team Contributors designated the list for 2025, signaling their confidence in its relevance over the coming months. The updated list emphasizes a refined understanding of existing risks and includes new vulnerabilities identified through real-world exploits and advancements in LLM usage.

Let's get in-depth to know OWASP 10 risk LLM 2025

A Prompt Injection Vulnerability occurs when user inputs manipulate an LLM's behavior or output in unintended ways, even if the inputs are invisible to humans. These vulnerabilities stem from how models process prompts, potentially causing them to violate guidelines, generate harmful content, or enable unauthorized access. Techniques like Retrieval Augmented Generation (RAG) and fine-tuning help improve output accuracy but don't fully prevent prompt injections.

Prompt injection involves altering LLM behavior through specific inputs, while jailbreaking is a type of prompt injection that bypasses safety protocols. Mitigation strategies include system prompt safeguards and input handling, but preventing jailbreaking requires ongoing updates to the model's training and safety measures.

Prompt injection vulnerabilities arise due to the inherent nature of generative AI. Given the probabilistic operation of these models, achieving complete prevention remains uncertain. However, the following strategies can significantly reduce the risk and impact:

1. Constrain Model Behavior

Set explicit guidelines in the system prompt regarding the model's role, capabilities, and boundaries. Ensure strict context adherence, limit responses to defined tasks or topics, and instruct the model to disregard any attempts to alter core instructions.

2. Define and Validate Expected Outputs

Establish clear output formats and request detailed reasoning or citations. Use deterministic code to validate that outputs conform to these specifications.

3. Implement Input and Output Filtering

Identify sensitive categories and create rules to detect and manage such content. Apply semantic filters and string-checking to screen for unauthorized material. Use the RAG Triad (Relevance, Groundedness, Answer Quality) to assess responses and detect potentially harmful outputs.

4. Enforce Privilege Control and Least Privilege Access

Provide applications with dedicated API tokens for extended functionality. Manage these functions within code rather than exposing them to the model. Limit the model's access rights to the minimum necessary for its operations.

5. Introduce Human Approval for High-Risk Actions

Incorporate human-in-the-loop controls for sensitive operations to prevent unauthorized activities.

6. Segregate and Label External Content

Clearly separate and label untrusted content to minimize its influence on user prompts.

7. Conduct Adversarial Testing and Simulations

Regularly perform penetration tests and attack simulations. Treat the model as an untrusted entity to assess the effectiveness of trust boundaries and access controls.

Sensitive information, such as PII, financial details, or proprietary data, can pose risks to both LLMs and their application contexts. These risks include exposing sensitive data, privacy violations, and intellectual property breaches through model outputs. Users must be cautious about sharing sensitive data with LLMs, as it might be inadvertently disclosed.

To mitigate these risks, LLM applications should sanitize data to prevent sensitive input from being used in training, offer opt-out options through clear Terms of Use, and implement system-level restrictions on the data types the model can return. However, these measures are not foolproof and may be bypassed through techniques like prompt injection.

Sanitization

1. Apply Data Sanitization Techniques

Implement strategies to scrub or mask sensitive information before using data in training, ensuring personal or confidential details are excluded from the model.

2. Strengthen Input Validation

Adopt robust input validation processes to identify and filter harmful or sensitive data inputs, preventing potential risks to the model.

Access Controls

1. Enforce Minimal Access Policies

Restrict access to sensitive data based on the principle of least privilege, granting users or processes only the access essential for their function.

2. Limit External Data Access

Restrict model interactions with external data sources and ensure secure runtime data management to prevent unintended leaks.

Federated Learning and Privacy

1. Leverage Federated Learning

Train models using decentralized data across multiple servers or devices to reduce the risks associated with centralized data storage.

2. Adopt Differential Privacy Measures

Introduce techniques like adding noise to data or outputs to obscure individual data points and safeguard privacy.

User Education and Transparency

1. Train Users on Safe LLM Interactions

Provide clear instructions to users about avoiding sensitive data input and offer best practices for secure engagement with models.

2. Promote Transparent Data Policies

Clearly outline how data is collected, used, retained, and deleted. Offer users the ability to opt out of training data inclusion.

Secure System Configuration

1. Protect System Settings

Prevent users from accessing or altering the system's initial configurations to avoid exposing internal settings.

2. Follow Security Misconfiguration Best Practices

Refer to standards like "OWASP API8:2023 Security Misconfiguration" to mitigate risks of sensitive data exposure through error messages or misconfigured settings.

Advanced Privacy Techniques

1. Utilize Homomorphic Encryption

Enable secure data processing by adopting homomorphic encryption, which ensures data confidentiality even during model computations.

2. Implement Tokenization and Redaction

Apply tokenization techniques to preprocess sensitive information. Use pattern matching to detect and redact confidential data before processing.

LLM supply chains face unique vulnerabilities affecting training data, models, and deployment platforms, leading to risks like biased outputs, security breaches, or failures. Unlike traditional software, ML risks include tampering or poisoning attacks on third-party pre-trained models and data.

The use of open-access LLMs, fine-tuning methods like LoRA and PEFT, and platforms like Hugging Face heighten supply-chain risks. On-device LLMs further expand the attack surface. These risks overlap with "LLM04 Data and Model Poisoning," but focus specifically on the supply-chain dimension. A simple threat model is available for further insight.

Data poisoning occurs when pre-training, fine-tuning, or embedding datasets are deliberately manipulated to introduce vulnerabilities, biases, or backdoors. This interference can compromise a model's security, performance, or ethical alignment, resulting in harmful outputs or diminished functionality. Common risks include reduced model accuracy, generation of biased or inappropriate content, and exploitation of connected systems.

This threat can arise at various stages of the LLM lifecycle:

Identifying vulnerabilities at each stage is crucial, as data poisoning is a form of integrity attack that undermines a model's ability to make reliable predictions. The risk intensifies with external data sources, which may harbor unverified or malicious content.

Additionally, models distributed via shared repositories or open-source platforms face broader risks, such as malware introduced through malicious pickling. This technique embeds harmful code that activates when the model is loaded. Poisoning can also enable backdoors, which remain dormant until triggered by specific inputs. These hidden mechanisms are difficult to detect and can covertly transform the model's behavior, effectively turning it into a "sleeper agent."

Improper Output Handling refers to the failure to properly validate, sanitize, and manage the outputs generated by large language models (LLMs) before they are passed to other components and systems. Since LLM outputs can be influenced by user input, this issue is similar to giving users indirect access to additional system functionalities. Unlike Overreliance, which focuses on concerns about excessive dependence on LLM outputs' accuracy and appropriateness, Improper Output Handling deals specifically with the outputs before they are passed on. Exploiting an Improper Output Handling vulnerability can lead to issues like XSS and CSRF in web browsers, or SSRF, privilege escalation, and remote code execution in backend systems. Conditions that increase the risk of this vulnerability include:

An LLM-based system is often designed with a certain level of agency, allowing it to execute actions through functions or interact with other systems via extensions (also known as tools, skills, or plugins, depending on the vendor). These extensions may be selected by the system based on input prompts or outputs from the LLM, with an agent sometimes responsible for determining which extension to invoke. In agent-based systems, repeated calls to the LLM are made, with outputs from prior invocations used to guide subsequent actions.

Excessive Agency is a vulnerability that allows harmful actions to occur in response to unexpected, ambiguous, or manipulated outputs from the LLM, regardless of the cause behind the malfunction. Common triggers for this vulnerability include:

The root causes of Excessive Agency often stem from:

Excessive Agency can have a wide range of negative impacts on confidentiality, integrity, and availability, depending on the systems that the LLM-based application can interact with.

To prevent Excessive Agency in LLM-based systems, consider the following strategies:

While not preventing Excessive Agency directly, the following actions can help mitigate potential damage:

System prompt leakage occurs when instructions guiding LLMs inadvertently expose sensitive information like credentials or permissions. These prompts are not secure and should never store sensitive data.

The primary risk isn't the disclosure of the prompt itself but underlying issues like weak session management, improper privilege separation, or bypassing system guardrails. Attackers can often infer prompt constraints through system interaction, even without direct access.

To mitigate risks, avoid embedding sensitive data in prompts and focus on robust security practices at the application level.

Vectors and embeddings pose notable security challenges in systems that implement Retrieval-Augmented Generation (RAG) with Large Language Models (LLMs). Vulnerabilities in their generation, storage, or retrieval processes can be exploited to inject harmful content, alter model outputs, or gain unauthorized access to sensitive data -- whether through intentional attacks or accidental misuse.

RAG is a technique designed to boost the performance and contextual accuracy of LLM applications. It integrates pre-trained language models with external knowledge sources, relying on vectors and embeddings as core mechanisms to enable this enhancement.

Misinformation generated by LLMs represents a critical vulnerability for applications that depend on these models. This occurs when LLMs produce false or misleading information that appears credible, potentially leading to security risks, reputational harm, and legal exposure.

A primary source of misinformation is hallucination -- when an LLM generates content that seems accurate but is entirely fabricated. Hallucinations arise as models fill gaps in their training data based on statistical patterns, without actual comprehension of the content. As a result, they may provide answers that sound plausible but lack any factual basis. In addition to hallucinations, biases embedded in the training data and incomplete information further contribute to misinformation.

Another challenge is overreliance, where users place undue trust in LLM-generated content without verifying its accuracy. This overconfidence amplifies the risks of misinformation, as unverified outputs may be integrated into critical decisions or workflows, compounding errors and increasing potential harm.

Unbounded Consumption refers to a situation where a Large Language Model (LLM) generates outputs based on input queries without proper limits. Inference, a vital function of LLMs, involves applying learned patterns and knowledge to produce relevant responses or predictions.

Certain attacks aim to disrupt services, drain financial resources, or steal intellectual property by replicating a model's behavior. These attacks rely on a common vulnerability to succeed. Unbounded Consumption occurs when an LLM application permits excessive and uncontrolled inferences, leading to risks such as denial of service (DoS), financial loss, model theft, and service degradation. The significant computational demands of LLMs, particularly in cloud environments, make them susceptible to resource exploitation and unauthorized access.

Input Validation

Enforce strict input validation to ensure data does not exceed acceptable size limits.

Limit Exposure of Logits and Logprobs

Limit or obscure the exposure of logit_bias and logprobs in API responses, revealing only essential information while protecting detailed probabilities.

Rate Limiting

Enforce rate limits and quotas to control the number of requests a single source can make within a defined timeframe.

Resource Allocation Management

Continuously monitor and adjust resource allocation to prevent any individual request or user from overusing system resources.

Timeouts and Throttling

Implement timeouts and throttle resource-intensive tasks to avoid prolonged resource consumption.

Sandbox Techniques

Limit the LLM's access to network resources, internal services, and APIs. This is vital for mitigating insider risks and side-channel attacks by controlling access to data and resources.

Comprehensive Logging, Monitoring, and Anomaly Detection

Monitor resource usage and implement logging systems to identify and respond to abnormal patterns of resource consumption.

Watermarking

Utilize watermarking techniques to detect and track unauthorized use of LLM-generated outputs.

Graceful Degradation

Design the system to function partially under heavy load, ensuring continued operation even when full functionality cannot be maintained.

Limit Queued Actions and Scale Robustly

Place limits on queued actions and total actions, while employing dynamic scaling and load balancing to maintain consistent performance.

Adversarial Robustness Training

Train models to identify and mitigate adversarial inputs and attempts to extract sensitive information.

Glitch Token Filtering

Create lists of known glitch tokens and scan the output to avoid adding them to the model's context window.

Access Controls

Enforce strong access controls, such as role-based access control (RBAC), and adhere to the principle of least privilege to restrict unauthorized access to LLM repositories and training environments.

Centralized ML Model Inventory

Maintain a centralized inventory or registry of production models, ensuring proper governance and secure access control.

Automated MLOps Deployment

Implement automated MLOps pipelines with governance, tracking, and approval processes to strengthen deployment controls and access management in the infrastructure.

The OWASP Top 10 for LLMs 2025 reflects the dynamic landscape of AI security. Staying informed and proactive about these changes ensures that businesses not only leverage AI's benefits but also safeguard their systems against emerging threats. Review your current AI security protocols. Are they aligned with the latest OWASP standards? Contact strobes for securing your data! Start a free trial now.

Previous articleNext article

POPULAR CATEGORY

business

4140

general

5465

health

4177

sports

5587