FinGraV: Methodology for Fine-Grain GPU Power Visibility and Insights

December 17, 2024 · Declared Dead · 🏛 IEEE International Symposium on Performance Analysis of Systems and Software

"No code URL or promise found in abstract"

Evidence collected by the PWNC Scanner

Authors Varsha Singhania, Shaizeen Aga, Mohamed Assem Ibrahim arXiv ID 2412.12426 Category cs.AR: Hardware Architecture Cross-listed cs.DC Citations 1 Venue IEEE International Symposium on Performance Analysis of Systems and Software Last Checked 3 months ago

Abstract

Ubiquity of AI makes optimizing GPU power a priority as large GPU-based clusters are often employed to train and serve AI models. An important first step in optimizing GPU power consumption is high-fidelity and fine-grain power measurement of key AI computations on GPUs. To this end, we observe that as GPUs get more powerful, the resulting sub-millisecond to millisecond executions make fine-grain power analysis challenging. In this work, we first carefully identify the challenges in obtaining fine-grain GPU power profiles. To address these challenges, we devise FinGraV methodology where we employ execution time binning, careful CPU-GPU time synchronization, and power profile differentiation to collect fine-grain GPU power profiles across prominent AI computations and across a spectrum of scenarios. Using the said FinGraV power profiles, we provide both, guidance on accurate power measurement and, in-depth view of power consumption on state-of-the-art AMD Instinct MI300X. For the former, we highlight a methodology for power differentiation across executions. For the latter, we make several observations pertaining to GPU sub-component power consumption and GPU power proportionality across different scenarios. We believe that FinGraV unlocks both an accurate and a deeper view of power consumption of GPUs and opens up avenues for power optimization of these ubiquitous accelerators.