Using Jazz  |  Getting Started  |  About the LCRC  |  Presentations  |  Status  |  FAQ  |  Search   |   Main Page  

Application Performance Analysis Overview

One of the most common concerns for scientific applications is: how fast do they run?  This is far from  a full reference for how to study performance - it just provides a few interesting questions you might want to ask when starting the process.

First Steps

Understand these components of your application.  The better you understand these, the better we can help you.  The point is to understand where the application is spending all its time.  Then, you can worry about improving that one function that takes 60% of the runtime.  With that in mind, the first two points are the most important, then comes the third.
  1. How are you measuring time?
  2. Where are you measuring times?
  3. What is your communication pattern?

Single CPU Performance

Single CPU performance is how well each process performs on it processors ignoring the effect of communication. 

Single CPU performance is a great mystery.  Strive to understand some basics, since to truly understand the performance you would have to be the compiler writer with very good understanding of the hardware.  That being said, there are certain rules of thumb that can make it easier for the compiler to do good things, and, for the hardware to work to your advantage:

  1. Algorithm improvement
  2. Spacial and Temporal locality of data
Certainly, in specific cases there are more that can be done.  In come cases you might unroll loops.  If you are on an Intel box you might look into their limited vector directives.  A particular implementation of a math library function might be crappy, etc.  But, keep in mind, these alterations might not be portable, even between versions of the same compiler.  Also keep in mind because of the mystery of what is happening below the hood with a lot of compilers and  chips, even making changes you are sure are for the better might result in strange behavior in other parts of the code.

Should you pay attention to manufacturers 'peak' performance numbers?


The story is much longer than what I write here, but, basically, manufacturers 'peak' numbers are measured using applications that are almost perfect for maximizing FLOPS rates and bandwidth.  It is common to compare performance against these 'peak' values, but, for example, 30% of peak FLOPS is considered incredibly good and uncommon for a real application.

Parallel Performance

Many start evaluating an applications performance by looking at the efficiency of the communication.  Here are a few of the basics you need to have to understand parallel performance:
  1. What is the communication pattern/algorithm?
  2. What are the sizes of messages being sent in the different patterns?
  3. What is the expected performance of the system you are using?
Tools like jumpshot and fpmpi can be extremely useful in evaluating these points.  They allow you get a good deal of information with  no changes to your code - if using mpich.  All you do is link in specific libraries.



riley@mcs.anl.gov

Help Security/Privacy Notice Disclaimer