found here 26 Transfer Learning for Performance Modeling of Configurable Systems: An Exploratory Analysis Pooyan Jamshidi Carnegie Mellon University, USA Norbert Siegmund Bauhaus-University Weimar, Germany Miguel Velez, Christian K¨ astner Akshay Patel, Yuvraj Agarwal Carnegie Mellon University, USA Abstract—Modern software systems provide many configura- tion options which significantly influence their non-functional properties. To understand and predict the effect of configuration options, several sampling and learning strategies have been proposed, albeit often with significant cost to cover the highly dimensional configuration space. Recently, transfer learning has been applied to reduce the effort of constructing performance models by transferring knowledge about performance behavior across environments. While this line of research is promising to learn more accurate models at a lower cost, it is unclear why and when transfer learning works for performance modeling. To shed light on when it is beneficial to apply transfer learning, we conducted an empirical study on four popular software systems, varying software configurations and environmental conditions, such as hardware, workload, and software versions, to identify the key knowledge pieces that can be exploited for transfer learning. Our results show that in small environmental changes (e.g., homogeneous workload change), by applying a linear transformation to the performance model, we can understand the performance behavior of the target environment, while for severe environmental changes (e.g., drastic workload change) we can transfer only knowledge that makes sampling more efficient, e.g., by reducing the dimensionality of the configuration space. Index Terms—Performance analysis, transfer learning. I. INTRODUCTION Highly configurable software systems, such as mobile apps, compilers, and big data engines, are increasingly exposed to end users and developers on a daily basis for varying use cases. Users are interested not only in the fastest configuration but also in whether the fastest configuration for their applications also remains the fastest when the environmental situation has been changed. For instance, a mobile developer might be interested to know if the software that she has configured to consume minimal energy on a testing platform will also remain energy efficient on the users’ mobile platform; or, in general, whether the configuration will remain optimal when the software is used in a different environment (e.g., with a different workload, on different hardware). Performance models have been extensively used to learn and describe the performance behavior of configurable sys- Fig. 1: Transfer learning is a form of machine learning that takes advantage of transferable knowledge from source to learn an accurate, reliable, and less costly model for the target environment. their byproducts across environments is demanded by many application scenarios, here we mention two common scenarios: • Scenario 1: Hardware change: The developers of a soft- ware system performed a performance benchmarking of the system in its staging environment and built a performance model. The model may not be able to provide accurate predictions for the performance of the system in the actual production environment though (e.g., due to the instability of measurements in its staging environment [6], [30], [38]). • Scenario 2: Workload change: The developers of a database system built a performance model using a read-heavy workload, however, the model may not be able to provide accurate predictions once the workload changes to a write- heavy one. The reason is that if the workload changes, different functions of the software might get activated (more often) and so the non-functional behavior changes, too. In such scenarios, not every user wants to repeat the costly process of building a new performance model to find a