benchmarks and the new preferen based benchmarks with LLM-as-a-judge, one can swiftly and automatically evaluate both the c capabilities and human alignment of models. We publicly release 80 MT-bench questions, 3K exp votes, and 30K conversations with human preferences for future study. Table 1: Sample multi-turn questions in MT-bench. Category Sample Questions Writing 1st Turn Compose an engaging travel blog post about a recent trip to Hawaii, highlighting cultural experiences and must-see attractions. 2nd Turn Rewrite your previous response. Start every sentence with the letter A. Math 1st Turn Given that f(x) = 4x3 9x 14, find the value of f(2). 2nd Turn Find x such that f(x) = 0. Knowledge 1st Turn Provide insights into the correlation between economic indicators such as GDP, inflation, and unemployment rates. Explain how fiscal and monetary policies ... 2nd Turn Now, explain them again like I’m five. 2 MT-bench and Chatbot Arena [Zheng+ 2023]