In the era of higher memory bandwidth, larger vector register, increasing core count, have you ever asked where your performance lays? What is the maximum speed-up achievable on the underlying architecture? How the code implementation is effecting, inducing and improving the reaching goal of the High Performance Computing?
In this tutorial, we will answer all these questions and you will have the opportunity to learn techniques, methods and solutions on how to improve your code, how to enable the new hardware features and how to use the roofline model to visualize the potential benefits of your optimization process.
We will start with an overview of the latest micro-processor architectures and how the intrinsic parallelism has been implemented in hardware, mainly the SIMD instructions and multi-threading. Then we focus on how to define and measure processor and memory performance and how this is related to the application level. In particular, we describe the roofline model approach, which gives an estimation and a visual model useful to estimate the application performance and the limitation of the underlying hardware.
// Fabio Baruffa
is a software technical consulting engineer in the Developer Products Division (DPD) at Intel. He is working in the compiler team and provides customer support in the high performance computing (HPC) area. Prior at Intel, he has been working as HPC application specialist and developer in the largest supercomputing centers in Europe, mainly the Leibniz Supercomputing Center and the Max-Plank Computing and Data Facility in Munich, as well as Cineca in Italy. He has been involved in software development, analysis of scientific code and optimization for HPC systems. He holds a PhD in Physics from University of Regensburg for his research in the area of spintronics device and quantum computing.