Recipe(BaseModel): name: str description: str ingredients: list[str] response = client.models.generate_content( model=MODEL_ID, contents="List a few popular cookie recipes and their ingredients.", config=GenerateContentConfig( response_mime_type="application/json", response_schema=Recipe))
How closely the response matches with the reference? • ROUGE: How closely the response summarization matches the reference? Problems: 1. You need a reference dataset 2. Fall short in capturing semantic nuances
context for the input? • Contextual recall* - Did it fetch all the relevant information? • Contextual precision* - Do relevant nodes in the context rank higher than the irrelevant ones? Generator metrics • Answer relevance - How relevant is the output to the input? • Faithfulness / Groundedness - Does the output factually align with the context? Model-graded metrics (RAG) *require an expected output