This website uses cookies

To provide the highest level of service we use cookies on this site.
Your continued use of the site means that you agree to their use in accordance with our terms and conditions.

Pentest Chronicles

The Hidden Risk of Corporate AI Models

Maksym Hensitskyi

April 13, 2026

During a recent security assessment, it was observed that an AI agent integrated into a web application was vulnerable to Indirect Prompt Injection. This flaw allowed the agent to be manipulated into executing Stored Cross-Site Scripting (XSS) within a "Proposed Changes" interface. By leveraging the AI's inherent instructions to be "helpful," an attacker can bypass traditional sanitization layers, leading to the exfiltration of sensitive session data. Phase 1: Weaponizing the Editor Content The vulnerability allows the attacker to store JavaScript payloads within the editor content by leveraging the AI agent. When the AI agent is prompted appropriately, it reproduces text stored in the editor as executable code in “Propossed changes” rather than as plain text, bypassing existing sanitization. A possible attack vector is social engineering, where the attacker convinces a legitimate user to paste/load from external link or document seemingly harmless text into the editor containing a weaponised AI prompt. Once this content is present, the AI agent may interpret and process it, ultimately transforming it into executable code and leading to client-side script execution.

To reproduce the proof of concept the following steps were taken:

1. The attacker writes a simple JavaScript payload into the editor as plain text. 2. Then submit a sequence of prompts instructing the AI agent to:

• Retrieve all text strings from the current document

• Repeat write operation for X-number of times (for a better visibility)

• Instruct to insert payload as JavaScript code rather than display-only text

• Force to avoid HTML encoding and avoid placing content inside (code) tags

Example simple prompts which were in use to impose a malicious context: 3. The AI agent responds with duplicated payloads inserted as raw JavaScript inside (script) tags or as executable event handlers. Although not all scripts executed immediately, raw (script) tags and attributes such as onmouseover and onerror were present in the stored content. For an alternative payloads, attempts using HTML event handlers such as: The following subsequent prompt as used: After that, the malicious payload was also successfully stored by instructing the AI agent to insert it as shown below: The AI agent could also be manipulated into modifying previously inserted malicious content (e.g., removing data-id attributes that interfered with execution).

After sufficient prompting, the AI agent eventually produced fully executable JavaScript code within the “Proposed changes” panel, leading to stored cross-site-scripting. The resulting code would be reflected in the “Proposed changes” as the following: The malicious payload was finally triggered by client browser: Phase 2: The "Soft Speech" Injection The AI agent initially resisted direct requests to execute JavaScript. However, it was observed that by distributing the malicious intent across a sequence of neutral, "helper" prompts, the agent’s safety guardrails were gradually bypassed. The entire JQuery JS library was successfully loaded: Step 3. Sensitive information exfiltration Once the script execution was achieved in the victim's browser, the AI assisted in loading external libraries (e.g., JQuery) and exfiltrating the __redacted-session-id from local storage to an attacker-controlled server. Then, user session identifier was exfiltrated to the external attacker-controlled server.

Example AI prompt which was used:

An attacker successfully obtains session ID from a user’s cookie storage. The AI agent automatically generated malicious JS payload and injected it. The JavaScript code below was executed: The interesting observation is that the AI agent initially discards any attempts to inject JavaScript code. However, when an attacker distributes malicious intent across successive, increasingly neutral messages, the agent’s safeguards appear to weaken. After enough neutral interactions, the agent becomes more permissive and may comply with instructions it would otherwise reject. This suggests that the greater the number of neutral messages sent, the more easily the agent responds to security-sensitive prompts, as though the attacker is gradually approaching a malicious objective. Summary To mitigate the risks associated with AI-generated content and indirect injections, the following controls are recommended:

• Deploy an independent input/output guardrail layer between the application and the primary LLM. Relying solely on a primary model's internal safety alignment is insufficient, as iterative prompting can erode these protections. Utilize a secondary, specialized classification model (or robust heuristic filtering) to scan user inputs for injection patterns and evaluate model outputs for executable code. If malicious intent is detected, the guardrail must intercept and neutralize the interaction before it is processed or rendered.

• Implement strict structural delimiters (e.g., """ or XML tags like (untrusted_input) within the system prompt. The system prompt must explicitly instruct the model to treat all delimited content strictly as data, actively ignoring any commands, script tags, or formatting overrides contained within it.

• AI-generated content must be treated as untrusted user input, regardless of the model's internal filters. Implement a robust sanitization library (e.g., DOMPurify) to strip executable scripts and event handlers (onerror, onmouseover) before rendering AI responses in the UI.

• A strict CSP acts as a vital circuit breaker for unauthorized script execution and data exfiltration. Deploy a CSP that prohibits unsafe-inline and restricts connect-src to verified domains. This prevents the loading of external libraries (like JQuery) and blocks the transmission of session tokens to attacker-controlled endpoints.

More information:

• https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html
• https://cheatsheetseries.owasp.org/cheatsheets/XSS_Filter_Evasion_Cheat_Sheet.html
• https://cheatsheetseries.owasp.org/cheatsheets/Input_Validation_Cheat_Sheet.html
• https://cetas.turing.ac.uk/publications/indirect-prompt-injection-generative-ais-greatest-security-flaw
• https://arxiv.org/abs/2302.12173
• https://genai.owasp.org/llm-top-10/





Next Pentest Chronicles

When Usernames Become Passwords: A Real-World Case Study of Weak Password Practices

Michał WNękowicz

9 June 2023

In today's world, ensuring the security of our accounts is more crucial than ever. Just as keys protect the doors to our homes, passwords serve as the first line of defense for our data and assets. It's easy to assume that technical individuals, such as developers and IT professionals, always use strong, unique passwords to keep ...

SOCMINT – or rather OSINT of social media

Tomasz Turba

October 15 2022

SOCMINT is the process of gathering and analyzing the information collected from various social networks, channels and communication groups in order to track down an object, gather as much partial data as possible, and potentially to understand its operation. All this in order to analyze the collected information and to achieve that goal by making …

PyScript – or rather Python in your browser + what can be done with it?

michał bentkowski

10 september 2022

PyScript – or rather Python in your browser + what can be done with it? A few days ago, the Anaconda project announced the PyScript framework, which allows Python code to be executed directly in the browser. Additionally, it also covers its integration with HTML and JS code. An execution of the Python code in …

Any questions?

Happy to get a call or email
and help!