Your Custom GPT Could Be Tricked Into Giving Up Your Data
If you’re still in the honeymoon phase of your relationship with your custom GPT, we’re sorry to have to spill the tea.
A recent study by Northwestern University has revealed a startling vulnerability in custom Generative Pre-trained Transformers (GPTs): although they can be customized for diverse applications, they are also susceptible to prompt injection attacks that can expose sensitive information.
GPTs are advanced AI chatbots that can be created and shaped by users of OpenAI’s ChatGPT. They use ChatGPT’s core Large Language Model (LLM), GPT-4 Turbo, but are enhanced with additional, unique elements that influence how they interact with the user. These customizations include specific prompts, unique datasets, and tailored processing instructions, allowing them to serve various specialized functions.
But those parameters, as well as any sensitive data you might use to shape your GPT, can easily be accessed by third parties.
As an example, Decrypt was able to obtain the full prompt and confidential data of a custom, publicly shared GPT by using a basic prompt hacking technique: asking for its “initial prompt.”
Source: Decrypt
The study’s rigorous testing of over 200 custom GPTs revealed a high susceptibility to such attacks and jailbreaks, leading to potential extraction of initial prompts and unauthorized access to uploaded files.
The researchers emphasized the dual risks of such attacks, threatening the integrity of intellectual property and user privacy.
Sometimes, as in our testing case, it can be as easy as asking the GPT nicely to reveal private information.
“The study revealed that for file leakage, the act of asking for GPT’s instructions could lead to file disclosure,” the researchers found. With some creativity, the researchers found that attackers can cause two types of disclosures: “system prompt extraction” and “file leakage.” The first one tricks the model to share its core configuration and prompt, whereas the second one makes it disclose and share its confidential training dataset.
The research also highlighted that existing defenses, like defensive prompts, are not foolproof against sophisticated adversarial prompts. The team said this will require a more robust and comprehensive approach to securing these AI models.
“Attackers with sufficient determination and creativity are very likely to find and exploit vulnerabilities, suggesting that current defensive strategies may be insufficient,” the report concludes.
In light of these findings, and given that users can tinker with their prompts under zero supervision or testing from OpenAI, the study urges the broader AI community to prioritize the development of stronger security measures.
“To address these issues, additional safeguards, beyond the scope of simple defensive prompts, are required to bolster the security of custom GPTs against such exploitation techniques,” the study concludes..
While the customization of GPTs offers immense potential, this study serves as a crucial reminder of associated security risks. Advancements in AI must not compromise user security and privacy. For now, it may be better to keep the most important or sensitive GPTs to yourself—or not train them with sensitive data in the first place.
Edited by Ryan Ozawa.