and answers • Limited Unlimited format • English only Any language • No Unlimited images • Expanding quiz content is difficult easy with GenAI Revisit: Initial problems
from LLMs • Slow LLM calls • Hallucinations • Hard to check the accuracy and quality of LLM outputs • Fast changing landscape (models, APIs, libraries, etc.) New problems with GenAI
• Quiz/image generation with a single API call Hard to do things well and consistently • Good results require prompt engineering • You will get inconsistent outputs • Hard to measure the output quality
fail => Retry and keep the user informed • LLM can give you malformed JSON ⇒ Can you still parse JSON somehow? • LLM can return empty results ⇒ Can you live with no quizzes or no image? • LLM can be too cautious ⇒ Do you need to change safety settings?
like LangChain • You can use Gemini from Google AI Studio and Vertex AI but each has different libraries • In Vertex AI, libraries for PaLM and Gemini are different • Other non-Google models have their own libraries • LangChain can help to abstract all of this away
quiz actually on the topic of history? • Is the answer actually correct? • Is the generated image appropriate for the quiz? (still an open question) Need a way to measure LLM outputs • Automate it, and use as a benchmark to work towards 🎓 Testing and Validation
of the form: Q: question A: answer For example…Who was the first US president? A. Thomas Jefferson B. Alexander Hamilton C. George Washington D. Bill Clinton can be decomposed into these four assertions: • Q: Who was the first US president? A: Thomas Jefferson is False • Q: Who was the first US president? A: Alexander Hamilton is False • Q: Who was the first US president? A: George Washington is True • Q: Who was the first US president? A: Bill Clinton is False
assertions true or false?” Q: Who was the first US president? A: Thomas Jefferson Q: Who was the first US president? A: Alexander Hamilton Q: Who was the first US president? A: George Washington Q: Who was the first US president? A: Bill Clinton LLM: False False True False