CHPC Software: Parallel Application Development Tools
- Debuggers: Totalview, available at Arches,
- Profilers: Vampir/TAU, available at Arches
TotalView is a full-featured, source-level, multi-process, multi-thread graphical debugger f or C, C++, Fortran (77 and 90), PGI, HPF, assembler, and mixed source/assembler codes. Totalview provides industry standard support for parallel and distributed computing, e.g. MPI, PVM, and IBM's Parallel Operating Environment (POE).
More information:
- TotalviewTech, manufacturer of Totalview
- Online Documentation from TotalviewTech
- Student Express information. Student Express is a program that provides a license of Totalview for free to qualified students. Please, contact CHPC for more information.
- CHPC's Debugging with Totalview tutorial presentation.
Notes on installation on the CHPC machines
Arches Metacluster
Totalview on the Arches is installed in
/uufs/arches/sys/pkg/totalview/std. Current version is
8.6. Both serial and parallel debugging (MPI, OpenMP) are supported.
Totalview supports most available compilers, including GNU, PGI, Pathscale and Intel.
In order to compile your code and run it in Totalview, follow these steps:
Serial code:
Serial code can be debugged on the interactive nodes, provided it does not generate high load that can affect responsiveness to other users.
- Compile with
-g:- For GNU compilers:
g77 -g source_name.f -o executable_name
- For PGI compilers:
pgf77 -g source_name.f -o executable_name
- For Pathscale compilers:
pathf90 -g source_name.f -o executable_name
- For Intel compilers:
ifort -g source_name.f -o executable_name
- For GNU compilers:
- Run the program in Totalview:
totalview ./executable_name
Parallel MPI/OpenMP code:
Parallel code can be debugged on the interactive nodes, or on the compute nodes. For smaller problems, we recommend using interactive nodes, as debugging on compute nodes has several limitations.
Parallel debugging on interactive nodes:
In order to debug in parallel on iteractive nodes, one has to use MPICH2 MPI implementation. This is because MPICH2 supports single node parallel program launch without the need for the rsh command, which is disabled on interactive nodes for security reasons.
Also, since interactive nodes have limited number of processors (2 or 4) and memory (2 to 4 GB), only parallel programs that need less processors and memory should be debugged on interactive nodes. Note that you can debug more than 4 process parallel program on 4 physical CPUs, since debugging in general puts less load on the CPUs and as such more than one process can be executed at a single CPU at the time. Still, debugging using say 4 processors should be sufficient for most problems as most parallel programming bugs show on any number of processors.
The debugging process on interactive nodes is as follows:
- Check that MPICH2 is the default compiler
which mpicc
should return
/uufs/arches/sys/pkg/mpich2/1.0.7/bin/mpiccfor GNU/Pathscale MPICH2 build,/uufs/arches/sys/pkg/mpich2/1.0.7p/bin/mpiccfor PGI MPICH2 build or/uufs/arches/sys/pkg/mpich2/1.0.7i/bin/mpiccfor Intel MPICH2 build. For more details on MPI implementations on Arches, see CHPC's MPI Documentation. Follow this documentation if you are setting up MPICH2 for the first time (there is a file that needs to be created, called~/.mpd.confbefore running MPICH2 programs. - Compile with
-g:mpif77 -g source_name.f -o executable_name
- Start MPICH2's MPD daemon on the give node
mpdboot -n 1
- Start Totalview:
totalview &
- Totalview opens New Program window. In this window, fill in the following:
- In tab Program, field Program, put in the executable program name
- In tab Parallel, field Parallel System, choose MPICH2, and for Tasks set the number of processors you want to run on
- Click OK to load the program into Totalview. After a little while, you will get a dialog box saying: Process executable.exe is a parallel job. Do you want to stop now? Click Yes, set some breakpoints and start debugging.
Parallel debugging on compute nodes:
In order to debug on compute nodes, one has to run a job through the queue. This complicates the process slightly so, debug on compute nodes only if debugging on interactive node does not reproduce the error you are looking for.
- Compile with
-g:/uufs/arches/sys/pkg/mpich/bin/std/mpicc -g source_name.f -o executable_name
/uufs/arches/sys/pkg/mpich/std/bin/mpif90 -g source_name.f -o executable_name- For parallel OpenMP
application using PGI compilers and include
-mpto invoke OpenMP directives support,-ompfor Pathscale compilers or-openmpfor Intel compilers. - For debugging parallel codes running over
Myrinet, compile with MPICH for Myrinet:
/uufs/$UUFSCELL/sys/pkg/mpich-mx/std/bin/mpicc -g source_name.f -o executable_name/uufs/$UUFSCELL/sys/pkg/mpich-mx/std/bin/mpif90 -g source_name.f -o executable_name
- For debugging parallel codes running over
InfiniBand, compile with MVAPICH:
/uufs/sanddunearch.arches/sys/pkg/mvapich/std/bin/mpicc -g source_name.f -o executable_name/uufs/sanddunearch.arches/sys/pkg/mvapich/std/bin/mpif90 -g source_name.f -o executable_name
- ssh to the interactive node with X forwarding (flag -Y)
ssh -Y u0123456@delicatearch.chpc.utah.edu
- Start interactive PBS session specifying X forwarding flag (-X), e.g.:
qsub -I -X -l nodes=2,walltime=2:00:00
-
In case of debugging a parallel code over Myrinet, also set
the
TOTALVIEWenvironment variable:setenv TOTALVIEW "/uufs/arches/sys/totalview/bin/totalview"
- Run
mpirunwith-tvflag to invoke totalview. Use the$PBS_NODEFILEfor your machinefile the same way as you would in running with PBS script:/uufs/arches/sys/pkg/mpich/std/bin/mpirun -np 2 -tv -machinefile $PBS_NODEFILE executable_name
MPICH-MX with Myrinet uses slightly different startup mechanism, use
-totalviewas:/uufs/delicatearch.arches/sys/pkg/mpich-mx/std/bin/mpirun.ch_mx -np 2 -totalview -machinefile $PBS_NODEFILE executable_name
MVAPICH for InfiniBand uses
-tvflag, but, some other parameters are different:/uufs/sanddunearch.arches/sys/pkg/mvapich/std/bin/mpirun_rsh -rsh -np 2 -tv -hostfile $PBS_NODEFILE executable_name
InfiniPath MPI, that's installed on Updraft does not support Totalview. If you need to debug on Updraft, please, build your code with MPICH2. We are trying to persuade the vendor to include Totalview support. InfiniPath MPI supports text based gdb and pathdb, see mpirun --help for details on how to use these debuggers on Updraft.
- If you run Totalview for the first time, you have to
also specify a full path to the TV Server,
tdvsvr. To do that, modify Launch Strings dialog in Menu - Preferences to:/uufs/arches/sys/totalview/bin/tdvsvr
tdvsvr
- Click go to run the code. After a little while, you will get a dialog box saying: Process executable.exe is a parallel job. Do you want to stop now? Click Yes, set some breakpoints and start debugging.
Intel Trace Analyzer (ITA, fka Vampir) is a graphical profiling tool that enables user to analyze time the program spends in calculation, IO, communication,... It enables the user to quickly focus at the appropriate level of detail by zooming into an arbitrary part of the trace and by selecting interesting processes, events, and communications operations.
The analyzed program must generate a trace file, .vtf,
This can be done by an open source package TAU (Tuning and Analysis
Utilities). User has to link the code with TAU, run it to produce
TAU trace files and then convert them to Vampir format. These files
are then read by ITA for performance analysis.
Please note that the trace files can become quite large (gigabytes). Therefore, we recommend to run only a small section of the code with tracing enabled, e.g. single iteration, time step, etc. If that is not possible in your code, please, contact CHPC for options how to enable/disable tracing via code instrumentation.
Location of TAU and the Intel Trace Analyzer on CHPC machines and instructions how to use them are below.
Further references:
- TAU homepage at University of Oregon
- Intel Trace Collector webpage at Intel
- CHPC's MPI profiling with TAU/Vampir tutorial presentation.
MPI profiling involves two steps. First one must produce instrumented binary that allows timing information collection. On our systems, we use the TAU package for this purpose. Then one runs the instrumented executable and produces trace files that contain the timing information. Finally, these trace files are viewed in a program that lets user analyze the timing information. At CHPC, we use Intel Trace Analyzer for this purpose.
Binary is instrumented by using special TAU compiler wrappers instead of standard MPI compilers. In order to use these wrappers, one has to first source in the TAU environment.
source /uufs/arches/sys/pkg/tau/std/etc/tau.cshfor csh/tcshsource /uufs/arches/sys/pkg/tau/std/etc/tau.shfor sh/ksh/bash
Then one can either modify the program's Makefile and replace the default compilers with TAU compiler wrappers. Note that we are including TAU's Makefile to define all the TAU make variables.
TAUROOTDIR = uufs/arches/sys/pkg/tau/2.15include $(TAUROOTDIR)/include/MakefileF90 = $(TAU_COMPILER) pathf90CC = $(TAU_COMPILER) gcc
Alternatively, one can compile directly using TAU compiler
wrapper scripts, tau_f90.sh,tau_cc.sh,
tau_cxx.sh.
Once the executable is compiled, run it
to produce the trace files. Note that since this is an MPI program,
it must be run with the mpirun command, e.g.:
/uufs/arches/sys/pkg/mpich/std/bin/mpirun -np 4 -machinefile $PBS_NODEFILE ./executable
Upon finishing, there should be numerous files named
tautrace.* and events.* in the run directory.
These are the trace files in TAU format.
We have to convert these files to the Vampir trace file (vtf) format. This is done in two steps.
-
tau_merge tautrace.*.trc myprogram.trc- optionally add -n to break a stuck session -
tau2vtf myprogram.trc tau.edf myprogram.vtf
Trace files are viewed with the Trace Analyzer. In order to use ITA,
source a script in the .cshrc, .tcshrc or
.bashrc that sets paths and license information:
source /uufs/arches/sys/pkg/ita/std/etc/ita.csh(forcsh/tcsh)source /uufs/arches/sys/pkg/ita/std/etc/ita.sh(forksh/bash)
Then open the Trace Analyzer with the trace file:
traceanalyzer executable.vtf
Finally, here is an example of a Makefile entry for compiling MPEVB extension to DLPOLY molecular dynamics package with TAU.
Initial definitions remain unchanged
STRESS=STRESS
TYPE=3pt
TAUROOTDIR = /uufs/arches/sys/pkg/tau/std
include $(TAUROOTDIR)/include/Makefile
F90 = $(TAU_COMPILER) pathf90
.........
lots of other stuff
.........
arches-pa-tau : dpp
cp $(MPI_DIR)/include/mpif.h mpif.h
$(MAKE) \
LDFLAGS="-O3 -OPT:Ofast -OPT:Olimit=0 -L$(FFTW_LIBRARY)/lib -lfftw3 " \
FFLAGS="-c -O3 -OPT:Ofast -OPT:Olimit=0 " \
CPFLAGS="-D$(STRESS) -DMPI -DFFTW -P -D'pointer=integer*8' " \
TIMER="" EX=$(EX) BINROOT=$(BINROOT) $(TYPE)

