Analysis and Visualization of Communication/Computation Patterns of High-Performance Applications
Energy consumption is an increasingly important concern in high performance computing (HPC) data centers. Today, up to half of the energy in the computing clusters is consumed by the cooling infrastructure. While most of the focus to improve the power efficiency is targeted at the electronics, software can also contribute to such improvements. Thermally-aware job allocation policies are being researched to optimize the cooling costs by minimizing the peak inlet temperatures of the data center server nodes. However, job allocation decisions strongly affect the performance of HPC applications with signicant inter-processor communication. To achieve a balanced tradeoff between thermal performance and application throughput, the actual communication requirements of the applications need to be considered.
This thesis describes the development of two new profiling tools that aim at analyzing the communication and computation patterns of Charm++ based HPC applications. One of the tools is designed to collect and plot the amount of communication between every pair of processing elements, which allows to evaluate how communication is distributed across processors, while the other generates a graph composed of communication and computation nodes and their dependencies. This is achieved through the use of compiler introduced logging features, which allows for any
Charm++ application to be profiled without changes in its source code. The analysis they provide can be applied to the evaluation of the performance impact that new job allocation policies can have in communication performance for a particular application. As part of the validation process of the developed tools, to demonstrate their accuracy, a synthetic benchmarking tool is also developed.
Electrical and Computer EngineeringNortheastern