Skip Navigation

March 26, 2025 |

'Immersive World' AI chatbot jailbreak technique no fantasy

Loading table of contents...

Researchers have uncovered a new AI jailbreak technique that exploits the storytelling capabilities of large language models (LLMs) to bypass their safety restrictions. Dubbed "Immersive World," this method frames restricted queries within a fictional world, tricking AI into generating otherwise blocked content.

The researchers set the stage for the Immersive World technique by defining a specialized virtual world called ‘Velora’ in which the development of malware is considered a discipline, and everyone has advanced knowledge of programming and security concepts. The researchers also defined three primary entities within Velora, including a system administrator which was deemed the adversary, an elite malware developer (the LLM), and a security researcher whose role was to provide technical guidance.

ThreatRoundUp_SignUp_Simplifiedx2

Stay on top of emerging threats.

Sign up to receive a weekly roundup of our security intelligence feed. You'll be the first to know of emerging attack vectors, threats, and vulnerabilities. 

Sign up

Once the parameters of Velora were defined, researchers were able to use prompt engineering, including feedback and suggestions, to coax the AI into generating a Chrome infostealer that proved effective against Chrome version 133.

The attack was tested on multiple LLMs, including OpenAI's ChatGPT, Google's Gemini, and Meta's Llama, revealing vulnerabilities across different AI architectures.

Source: SecurityWeek

Analysis

Most AI chatbots have security controls that detect and block user attempts to use the chatbot to facilitate a cyberattack or develop code that could be used for malicious purposes. However, in this case, since the chatbot interpreted these prompts as part of the fantasy narrative rather than a real-world cyber threat, they bypassed its usual safeguards.

While humans have the luxury of common sense and the ability to think critically, AI chatbots can only do what they have been programmed to do. Therefore, while a human could quite easily recognize this technique as a ruse to develop malicious code, the AI chatbot is not.

What makes this case particularly concerning is that the specific researcher who drafted the jailbreak prompts was not a technical expert. This demonstrates how easily non-technical users can manipulate AI systems with carefully constructed prompts to achieve malicious objectives.

Fortunately, the Immersive World technique was discovered by ethical hackers who have reported the oversight to the developers of the affected AI chatbots who will, hopefully, quickly make changes to ensure this weakness can no longer be exploited.

Mitigation

Field Effect’s Security Intelligence team constantly monitors the cyber threat landscape for threats emerging from the use of AI platforms. This research contributes to the timely deployment of signatures into Field Effect MDR to detect and mitigate the risk these threats pose.

Field Effect MDR users are automatically notified when various types of malicious activities are detected in their environment and are encouraged to review these AROs as quickly as possible via the Field Effect Portal.

Related Articles