Track: High School STEM Competition
Abstract
Large Language Models (LLMs) play a crucial role in increasing productivity across various applications, yet concerns about their security persist. We defined a novel research area for LLM security – grey box testing, which involved adjustments of partially known information about LLMs, i.e. parameters such as temperature. This is the first investigation on the impact of temperature on LLM security, more specifically, jailbreaking. Jailbreak prompts, concatenated with inappropriate questions, were sent into top-performing open-source LLMs – Vicuna, Llama 2, and Mistral, with benchmarking against ChatGPT. Experiments were repeated over 5 temperature points: 0.0, 0.25, 0.5, 0.75 and 1.0, revealing a strong correlation between temperature and jailbreaking success. An increase in temperature generally results in higher jailbreaking success for Llama 2 and ChatGPT, and the converse is true for Vicuna and Mistral. We revised our hypothesis such that LLMs with low jailbreak success rate at 0 temperature are more vulnerable to jailbreaking as temperature increases, while LLMs with high jailbreak success rate at 0 temperature are less susceptible to jailbreaking as temperature rises. This research emphasises the importance of cautious usage of open-source LLMs and highlights the need for further research under grey box settings in other areas including data extraction and hallucinations.