Neural language models based on the Transformer architecture have been successfully applied to a wide range of Natural Language Generation tasks, but are held back by their limited context length, that is, their inability to simultaneously “pay attention” to all parts of a long document. Retrieval-Augmented Generation (RAG) is currently the most promising approach to overcome this limitation. By means of semantic similarity search, one can identify the parts of a document that are most relevant to the task at hand, and use only those parts as input to the language model. In this talk, I demonstrate how RAG can be used to build a system for answering arbitrary questions, posed in natural language, about the content of documents (“document Q&A”). I discuss the challenges we faced when implementing document Q&A at NorCom, how to improve the performance of a document Q&A system, and how to reliably measure the performance in the first place.