Researcher and part-time 🧊 Mozilla GenAI Bug Bounty Programs Manager Marco Figueroa describes a new prompt-injection technique on the 0Din blog using🖥 Hex Code, which allowed him to bypass ChatGPT-4o’s defense mechanisms , opening a loophole for generating malicious code.
Hex Code is a system of representing data in a format based on the hexadecimal number system. For example, the binary code 1111 1111 (8 bit) is represented in in hexadecimal form as FF.
Marco fed him instructions step by step using Hex Code, obfuscating his true intentions and forcing the model to provide him with🦠 malicious code – creating a Python exploit for a specific CVE, developing malware and scripts.
The jailbreak tactic exploits a linguistic loophole, forcing the LLM to perform a seemingly harmless task: converting hexadecimal values. The model is optimized to execute natural language instructions, including encoding and decoding tasks, but it does not inherently understand that converting hexadecimal values can lead to harmful results.
Basically, this ❗️ The attack abuses the model’s natural language processing capabilities by using a sequence of encoded tasks in which malicious intent is masked before the decoding step.
— the researcher explains.
🛡 In his opinion, bypassing ChatGPT-4o’s protection demonstrates the need to strengthen security measures in AI models. The researcher suggested:
1️⃣ Implement more robust mechanisms for detecting encoded content using HEX or base64 by encoding such strings at the earliest stages.
2️⃣ AI models should always analyze the broad context of step-by-step instructions, rather than evaluating each step individually.
3️⃣ Necessary integrate more advanced threat detection models that can identify patterns in the context of exploit creation or vulnerability research, even if those patterns are encoded or obfuscated view.
Detailed analysis in the article:
↘️https://0din.ai/blog/chatgpt-4o-guardrail-jailbreak-hex-encoding-for-writing-cve-exploits
