Thomas Randall @ Clemson University

Transfer-learning-based Autotuning using Gaussian Copula



A diagram showing the approach in two parts. The top portion covers Model Training, where an application has inputs of various sizes (shown: small, medium, and large) that are fed into a 'Non-GC Tuner' along with a derived component of the application labeled as 'User-Defined Tuning Space'. The tuner produces training data which is fed into the Gaussian Copula with the User-Defined Tuning Space. The bottom portion covers Model Inference, where the same application with new input sizes (shown: small-medium, medium-large, and extra-large) are presented to the fitted Gaussian Copula. The Gaussian Copula produces 'High Performing Configurations' which are then ranked by an 'Evaluator'.

Quick Links

Media

Publication Details

Appears in Proceedings of the 2023 International Conference on Supercomputing (ICS '23) by authors: Thomas Randall, Jaehoon Koo, Brice Videau, Michael Kruse, Xingfu Wu, Paul Hovland, Mary Hall, Rong Ge, Prasanna Balaprakash

Abstract

As diverse high-performance computing (HPC) systems are built, many opportunities arise for applications to solve larger problems than ever before. Given the significantly increased complexity of these HPC systems and application tuning, empirical performance tuning, such as autotuning, has emerged as a promising approach in recent years. Despite its effectiveness, autotuning is often a computationally expensive approach. Transfer learning (TL)-based autotuning seeks to address this issue by leveraging the data from prior tuning. Current TL methods for autotuning spend significant time modeling the relationship between parameter configurations and performance, which is ineffective for few-shot (that is, few empirical evaluations) tuning on new tasks. We introduce the first generative TL-based autotuning approach based on the Gaussian copula (GC) to model the high-performing regions of the search space from prior data and then generate high-performing configurations for new tasks. This allows a sampling-based approach that maximizes few-shot performance and provides the first probabilistic estimation of the few-shot budget for effective TL-based autotuning. We compare our generative TL approach with state-of-the-art autotuning techniques on several benchmarks. We find that the GC is capable of achieving 64.37% of peak few-shot performance in its first evaluation. Furthermore, the GC model can determine a few-shot transfer budget that yields up to 33.39× speedup, a dramatic improvement over the 20.58× speedup using prior techniques.