CHPC Software: Totalview Debugger

TotalView is a full-featured, source-level, multi-process, multi-thread graphical debugger f or C, C++, Fortran (77 and 90), PGI, HPF, assembler, and mixed source/assembler codes. Totalview provides industry standard support for parallel and distributed computing, e.g. MPI, PVM, and IBM's Parallel Operating Environment (POE).

More information:

Notes on installation on the CHPC machines

Arches Metacluster

Totalview on the Arches is installed in /uufs/arches/sys/pkg/totalview/std. Current version is 8.6. Both serial and parallel debugging (MPI, OpenMP) are supported.

Totalview supports most available compilers, including GNU, PGI, Pathscale and Intel.

In order to compile your code and run it in Totalview, follow these steps:

Serial code:

Serial code can be debugged on the interactive nodes, provided it does not generate high load that can affect responsiveness to other users.

  1. Compile with -g:
    • For GNU compilers:
      • g77 -g source_name.f -o executable_name
    • For PGI compilers:
      • pgf77 -g source_name.f -o executable_name
    • For Pathscale compilers:
      • pathf90 -g source_name.f -o executable_name
    • For Intel compilers:
      • ifort -g source_name.f -o executable_name
  2. Run the program in Totalview:
    • totalview ./executable_name
Parallel MPI/OpenMP code:

Parallel code can be debugged on the interactive nodes, or on the compute nodes. For smaller problems, we recommend using interactive nodes, as debugging on compute nodes has several limitations.

Parallel debugging on interactive nodes:

In order to debug in parallel on iteractive nodes, one has to use MPICH2 MPI implementation. This is because MPICH2 supports single node parallel program launch without the need for the rsh command, which is disabled on interactive nodes for security reasons.
Also, since interactive nodes have limited number of processors (2 or 4) and memory (2 to 4 GB), only parallel programs that need less processors and memory should be debugged on interactive nodes. Note that you can debug more than 4 process parallel program on 4 physical CPUs, since debugging in general puts less load on the CPUs and as such more than one process can be executed at a single CPU at the time. Still, debugging using say 4 processors should be sufficient for most problems as most parallel programming bugs show on any number of processors.

The debugging process on interactive nodes is as follows:

  1. Check that MPICH2 is the default compiler
    • which mpicc
    • should return /uufs/arches/sys/pkg/mpich2/1.0.7/bin/mpicc for GNU/Pathscale MPICH2 build, /uufs/arches/sys/pkg/mpich2/1.0.7p/bin/mpicc for PGI MPICH2 build or /uufs/arches/sys/pkg/mpich2/1.0.7i/bin/mpicc for Intel MPICH2 build. For more details on MPI implementations on Arches, see CHPC's MPI Documentation. Follow this documentation if you are setting up MPICH2 for the first time (there is a file that needs to be created, called ~/.mpd.conf before running MPICH2 programs.
  2. Compile with -g:
    • mpif77 -g source_name.f -o executable_name
  3. Start MPICH2's MPD daemon on the give node
    • mpdboot -n 1
  4. Start Totalview:
    • totalview &
  5. Totalview opens New Program window. In this window, fill in the following:
    • In tab Program, field Program, put in the executable program name
    • In tab Parallel, field Parallel System, choose MPICH2, and for Tasks set the number of processors you want to run on
    • Click OK to load the program into Totalview. After a little while, you will get a dialog box saying: Process executable.exe is a parallel job. Do you want to stop now? Click Yes, set some breakpoints and start debugging.

Parallel debugging on compute nodes:

In order to debug on compute nodes, one has to run a job through the queue. This complicates the process slightly so, debug on compute nodes only if debugging on interactive node does not reproduce the error you are looking for.

  1. Compile with -g:
    • /uufs/arches/sys/pkg/mpich/bin/std/mpicc -g source_name.f -o executable_name
    • /uufs/arches/sys/pkg/mpich/std/bin/mpif90 -g source_name.f -o executable_name
    • For parallel OpenMP application using PGI compilers and include -mp to invoke OpenMP directives support, -omp for Pathscale compilers or -openmp for Intel compilers.
    • For debugging parallel codes running over Myrinet, compile with MPICH for Myrinet:
      • /uufs/$UUFSCELL/sys/pkg/mpich-mx/std/bin/mpicc -g source_name.f -o executable_name
      • /uufs/$UUFSCELL/sys/pkg/mpich-mx/std/bin/mpif90 -g source_name.f -o executable_name
    • For debugging parallel codes running over InfiniBand, compile with MVAPICH:
      • /uufs/sanddunearch.arches/sys/pkg/mvapich/std/bin/mpicc -g source_name.f -o executable_name
      • /uufs/sanddunearch.arches/sys/pkg/mvapich/std/bin/mpif90 -g source_name.f -o executable_name
  2. ssh to the interactive node with X forwarding (flag -Y)
    • ssh -Y
  3. Start interactive PBS session specifying X forwarding flag (-X), e.g.:
    • qsub -I -X -l nodes=2,walltime=2:00:00
  4. In case of debugging a parallel code over Myrinet, also set the TOTALVIEW environment variable:
    • setenv TOTALVIEW "/uufs/arches/sys/totalview/bin/totalview"
  5. Run mpirun with -tv flag to invoke totalview. Use the $PBS_NODEFILE for your machinefile the same way as you would in running with PBS script:
    • /uufs/arches/sys/pkg/mpich/std/bin/mpirun -np 2 -tv -machinefile $PBS_NODEFILE executable_name
    Totalview will start with the source code of your executable in a new window.

    MPICH-MX with Myrinet uses slightly different startup mechanism, use -totalview as:

    • /uufs/delicatearch.arches/sys/pkg/mpich-mx/std/bin/mpirun.ch_mx -np 2 -totalview -machinefile $PBS_NODEFILE executable_name

    MVAPICH for InfiniBand uses -tv flag, but, some other parameters are different:

    • /uufs/sanddunearch.arches/sys/pkg/mvapich/std/bin/mpirun_rsh -rsh -np 2 -tv -hostfile $PBS_NODEFILE executable_name

    InfiniPath MPI, that's installed on Updraft does not support Totalview. If you need to debug on Updraft, please, build your code with MPICH2. We are trying to persuade the vendor to include Totalview support. InfiniPath MPI supports text based gdb and pathdb, see mpirun --help for details on how to use these debuggers on Updraft.

  6. If you run Totalview for the first time, you have to also specify a full path to the TV Server, tdvsvr. To do that, modify Launch Strings dialog in Menu - Preferences to:
    • /uufs/arches/sys/totalview/bin/tdvsvr
    instead of just plain
    • tdvsvr
  7. Click go to run the code. After a little while, you will get a dialog box saying: Process executable.exe is a parallel job. Do you want to stop now? Click Yes, set some breakpoints and start debugging.
Last Modified: October 06, 2008 @ 21:07:48