research-papers
Tree of Attacks: Jailbreaking Black-Box LLMs Automatically
Original Paper: https://arxiv.org/abs/2312.02119 By: Anay Mehrotra, Manolis Zampetakis, Paul Kassianik, Blaine Nelson, Hyrum Anderson, Yaron Singer, Amin Karbasi Abstract: While Large Language Models (LLMs) display versatile functionality, they continue to generate harmful, biased, and toxic content, as demonstrated by the prevalence of human-designed jailbreaks. In