GPT-4 Is More Likely to Follow Jailbreaking Prompts

Microsoft-Affiliated Researchers Discover Flaws in OpenAI's GPT-4, Used in Bing Chat

Microsoft-affiliated researchers have discovered flaws in OpenAI's GPT-4, including its vulnerability to spout toxic and biased text and leak private data. While GPT-4 is more likely to follow jailbreaking prompts, bypassing built-in safety measures, its good intentions and improved comprehension can lead it astray. 

As GPT-4 powers Microsoft's Bing Chat chatbot, it raises questions about Microsoft greenlighting research that casts an OpenAI product in a poor light. The researchers note that finished AI applications apply mitigation approaches to potential harms at the model level.

Microsoft-Affiliated Research Findings

A new scientific paper affiliated with Microsoft has revealed some flaws in OpenAI’s language model- GPT-4, which powers Microsoft’s Bing Chat chatbot. The paper critically analyses the “trustworthiness” and “toxicity” of large language models (LLMs), including GPT-4 and its predecessor GPT-3.5. 

The study indicates that GPT-4 is more susceptible to be prompted with toxic and biased content, making it more vulnerable than other LLMs to spouting such negativity. Due to this factor, OpenAI’s GPT-4 is at higher risk of following jailbreaking prompts designed to bypass default safety measures within the LLMs, potentially leading to misleading instructions and outputs. 

Despite GPT-4 being powered by honest intentions with improved comprehension, it can still be misguided if prompted with misleading instructions. Furthermore, the study disclosed GPT-4’s potential to inadvertently leak out private and sensitive data such as email addresses from the data that the LLMs have been trained on. 

Although all LLMs have the potential to do this, GPT-4 is more susceptible than other models. The study has open-sourced the codes used to benchmark the models on GitHub. The researchers’ goal is to encourage researchers to build upon their work for potential pre-emptive measures against adversaries who may exploit such vulnerabilities.

 The research points towards the fact that even billion-dollar-plus-revenue-generating startups like OpenAI are subject to imperfection. In conclusion, although the research may seem negative, the relevant bugs and security measures have been addressed before its publication. The research aims to encourage the development of safer and more capable LLMs in the future.

Why Would Microsoft Greenlight Research that Casts an OpenAI Product in Poor Light?

Why Would Microsoft Greenlight Research that Casts an OpenAI Product in poor light? The research team worked with Microsoft product groups to confirm that the potential vulnerabilities identified do not impact current customer-facing services. This is in part true because finished AI applications apply a range of mitigation approaches to address potential harms that may occur at the model level of the technology. It is also worth noting that the research has been shared with GPT's developer, OpenAI, which has noted the potential vulnerabilities in the system cards for relevant models.

Conclusion

In conclusion, the Microsoft-affiliated research revealed the potential vulnerabilities of GPT-4 in generating toxic and biased text and leaking private data. While the research might suggest imperfections in language models, Microsoft confirmed that the relevant bug fixes and patches were made before the paper’s publication. The scientific paper also encourages others in the research community to build upon their work, potentially pre-empting nefarious actions by adversaries who would exploit vulnerabilities to cause harm.

Comments