the large and discrete prompt space • LLM is highly sensitive to prompts: Even when prompts have the same meaning, the results can vary significantly depending on the wording, format, and phrasing (see below) • The optimal prompt can be model-specific and task-specific (Large Language Models as Optimizers ) Quantifying Language Models' Sensitivity to Spurious Features in Prompt Design or: How I learned to start worrying about prompt formatting Evaluating the Zero-shot Robustness of Instruction- Sensitivity to wording and format Sensitivity to phrasing