03 Jul, 2023

Faculty Bjoern Hartmann has published a new article, "Herding AI Cats: Lessons from Designing a Chatbot by Prompting GPT-3" with authors JD Zamfirescu-Pereira, Heather Wei, Amy Xiao, Kitty Gu, Grace Jung, Matthew G Lee, and Qian Yang. The article focuses on prompting Large Language Models (LLMs), an exciting new approach to designing chatbots. But questions if it can improve LLM’s user experience (UX) reliably enough to power chatbot products. Researchers attempt to design a robust chatbot by prompting GPT-3/4 alone suggests: not yet. Prompts made achieving “80%” UX goals easy, but not the remaining 20%. Fixing the few remaining interaction breakdowns resembled herding cats: They could not address one UX issue or test one design solution at a time; instead, researchers had to handle everything everywhere all at once.

Moreover, because no prompt could make GPT reliably say “I don’t know” when it should, the user-GPT conversations had no guardrails after a breakdown occurred, often leading to UX downward spirals. These risks incentivized the researchers to design highly prescriptive prompts and scripted bots, counter to the promises of LLM-powered chatbots. This paper describes this case study, unpacks prompting’s fickleness and impact on UX design processes, and discusses implications for LLM-based design methods and tools.

