Sierra Cluster User Guide

CHPC resources are available to qualified faculty, students (under faculty supervision), and researchers from any Utah institution of higher education. Users can request accounts for CHPC computer systems by filling out an account request form. This can be found by following the link below or by coming into Room 405, INSCC Building. (Phone 581-5253)

Users requiring more than the default service units (SU) per quarter need to send a brief proposal, using the the allocation form available either:

  • Web version: Allocation form.
  • Hardcopy: from our main office, 405 INSCC, 581-5253.

Users can access the Sierra interactive node via ssh (secure shell) at the following address:

rocky.chpc.utah.edu

Problems regarding network connectivity and other questions about the Sierra's availability should be addressed to the CHPC's consulting staff. Please refer to user services for assistance.

Interactive Work Policy: The CHPC systems are designed so that users doing interactive work have an acceptable response time, and users doing batch work have acceptable performance. Interactive work includes editing, compiling, debugging, and checking results. Batch processing includes any run that requires more than 15 minutes of CPU time. A question that provides a rule-of-thumb for interactive/batch processing is, "Will my job affect others' response time?" If the answer is yes, then it is a batch job.

Users who abuse CHPC policies will lose their accounts.

The batch implementation on all CHPC systems includes PBS, a resource manager, and a scheduler. The scheduler on the Compaq Sierra was written at the Pittsburgh Supercomputing Center.

Any MPI job or process which runs for more that 15 minutes will need to be run through the batch system.

There are three steps to running in batch:

  1. Create a batch script file.
  2. Submit the script to the batch system.
  3. Check on your job.

Example PBS Script for the Compaq Sierra

Note that shell programming is exactly like running commands in the shell. You simply write into the file the commands you would like to run the same way you would write them interactively.

The following is an example script for running in PBS on the Compaq Sierra. The lines at top of the file all begin with #PBS which are seen as comments to the shell, but give options to PBS.

In this example the job would be run on 1 node with 4 processors and 1 hour walltime. CHPC's policy on this is that you must use multiples of 4 processors on your jobs. Since we currently have 24 processors available in the production batch pool, the only valid specifications for the "-l" line are rmsnodes=1:4, rmsnodes=2:8, rmsnodes=3:12, rmsnodes=4:16, rmsnodes=5:20 and rmsnodes=6:24.

You will need to change the email address specification and the "path" in your home directory to copy files from and to (2 places). You will also need to decide which "prun" command. The example script is setup for an MPI job. If your program uses shared memory (including gaussian) you will want to comment out the MPI prun command and take out the comment in the shared memory prun line.

Example Sierra PBS Script.

#PBS -S /bin/csh 
#PBS -q sierra
#PBS -l rmsnodes=1:4,walltime=1:00:00
#PBS -m abe
#PBS -M youremail@address
#PBS -j oe
 
#Create execution directory
if (! -d /scratch/global/$PBS_JOBID) then
    echo "making /scratch/global/$PBS_JOBID"
    mkdir /scratch/global/$PBS_JOBID
 
#Copy the necessary files to the execution dir.
cp ~/path/a.out /scratch/global/$PBS_JOBID
 
#Change to execution directory
cd /scratch/global/$PBS_JOBID
 
##Run your job. Note $RMS_NODES and $RMS_PROCS is set by
##your specification in the #PBS -l line.
##Use one of the following prun commands

##For MPI jobs use:
prun -N $RMS_NODES -n $RMS_PROCS ./a.out &gt outputfile

##For Shared Memory (including gaussian) use:
#prun -N $RMS_NODES -n 1 ./a.out &gt outputfile
 
#Copy files back home and clean up.
rm ./a.out
cp * ~/path && cd .. && /bin/rm -rf /scratch/global/$PBS_JOBID

Queues on the Sierra Cluster

There are four queues available on the Sierra Cluster (3/26/02):

  • sierra - For regular jobs. Will not run on sierra2 or sierra3 which are reserved for queues "par" and "short".
  • test - for testing scripts - 1 node, 4 procs less than one hour.
  • par - for 2-way parallel jobs (rmsnodes=2:8) and can run on any nodes, but will have priority on nodes sierra2 and sierra3.
  • short - for 1-way or 2-way parallel jobs taking less than 12 hours. Can run on any nodes.

For example, if you wish to test your scripts and do short test runs you change the "-q" line to "#PBS -q test" in your script which will limit you to rmsnodes=1:4 and a walltime less than 1 hour.

See below for the prun batch options to run parallel programs on the Compaq Sierra.

Job Submission on the Compaq Sierra

Submit the job using the "qsub" PBS command. See the PBS commands below for additional PBS commands.

For example, to submit a script file named "pbsjob", type

qsub pbsjob

PBS sets and expects a number of variables in a PBS script. For information on these variables and necessities, enter:

man qsub

Checking Job Status

To check if your job is queued or running, use the "qstat" PBS command:

qstat

Another option is to use the "rms" command rinfo.

rinfo

Setting Up Your Environment for Batch

PBS will fail if you have tty-dependant commands in your .profile, .cshrc, .login or .logout. One means of preserving your login defaults and avoiding problems with PBS is by checking to see if you are in PBS before you execute a tty-dependant command. It is easy to check to see if you are in PBS by seeing if the environment variable PBS_ENVIRONMENT is set, and execute your commands appropriately.

Bourne and Korn shell users can accomplish this by modifying the .profile with the following:

if [-z "$PBS_ENVIRONMENT"] 
then
   # do interactive commands
       stty erase '^H'
       export term=xterm
else
   # do batch specific commands
endif

While csh users can us the following and their .cshrc and/or .login:

if ( $?PBS_ENVIRONMENT ) then
   # do batch specific commands
else
   # do interactive commands
       stty erase '^H'      
       set term xterm
endif

If you run csh and you have a ~/.logout script, you should place the following at the first and last of the file.

#First of csh ~/.logout
set EXITVAL = $status

#Last line of csh ~/.logout
exit $EXITVAL

PBS Batch Script Options

  • -a date_time.  Declares the time after which the job is eligible for execution. The date_time element is in the form: [[[[CC]YY]MM]DD]hhmm[.S].
  • -e path.  Defines the path to be used for the standard error stream of the batch job. The path is of the form: [hostname:]path_name.
  • -h.  Specifies that a user hold will be applied to the job at submission time.
  • -I.  Declares that the job is to be run "interactively". The job will be queued and scheduled as PBS batch job, but when executed the standard input, output, and error streams of the job will be connected through qsub to the terminal session in which qsub is running.
  • -j join.  Declares if the standard error stream of the job will be merged with the standard ouput stream. The join argument is one of the following:
    • oe-  Directs the two streams as standard output.
    • eo-  Directs the two streams as standard error.
    • n-  Any two streams will be separate(Default).
  • -l resource_list.  Defines the resources that are required by the job and establishes a limit on the amount of resources that can be consumed. Users will want to specify the walltime resource, and if they wish to run a parallel job, the ncpus resource.
  • -m mail_options.  Conditions under which the server will send a mail message about the job. The options are:
    • n: No mail ever sent
    • a (default): When the job aborts
    • b: When the job begins
    • e: When the job ends
  • -M user_list.  Declares the list of e-mail addresses to whom mail is sent. If unspecified it defaults to userid@host from where the job was submitted. You will most likely want to set this option.
  • -N name.  Declares a name for the job.
  • -o path.  Defines the path to be used for the standard output. [hostname:]path_name.
  • -q destination.  The destination is the queue.
  • -S path_list.  Declares the shell that interprets the job script. If not specified it will use the user's login shell.
  • -v variable_list.  Expands the list of environment variables which are exported to the job. The variable list is a comma-separated list of strings of the form variable or variable=value.
  • -V.  Declares that all environment variables in the qsub command's environment are to be exported to the batch job.

PBS User Commands

For any of the commands listed below you may do a "man command" for syntax and detailed information.

Frequently used PBS user commands:

  • qsub. Submits a job to the PBS queuing system. Please see qsub Options below.
  • qdel. Deletes a PBS job from the queue.
  • qstat. Shows status of PBS batch jobs.
  • xpbs. X interface for PBS users.

Less Frequently-Used PBS User Commands:

  • qalter. Modifies the attributes of a job.
  • qhold. Requests that the PBS server place a hold on a job.
  • qmove. Removes a job from the queue in which it resides and places the job in another queue.
  • qmsg. Sends a message to a PBS batch job. To send a message to a job is to write a message string into one or more of the job's output files.
  • qorder. Exchanges the order of two PBS batch jobs within a queue.
  • qrerun. Reruns a PBS batch job.
  • qrls. Releases a hold on a PBS batch job.
  • qselect. Lists the job identifier of those jobs which meet certain selection criteria.
  • qsig. Requests that a signal be sent to the session leader of a batch job.

RMS User Commands

For any of the commands listed below you may do a "man command" for syntax and detailed information.

The RMS user commands are:

  • prun. Loads and runs parallel programs. It can also run multiple copies of a sequential program.
  • rinfo. Displays information about the resources available and about the jobs which are running.

RMS Batch Options: prun Command:

  • -B  basenode  Specifies the number of the base node (the first to use) in the partitiion. Numbering within the partitiion starts at 0. By default the base node is unassigned, leaving the scheduler free to selct nodes that are not in use.
  • -c  CPUs  Specifies the number of CPUs required per process (default 1).
  • -h   Display the list of options.
  • -i   Allocate CPUs immediately or fail. By default, prun blocks until resources become available.
  • -m  block | cyclic  Specifies whether to use block (the default) or cyclic distribution of processes over nodes.
  • -n  processes  Specifies the number of processes required. The -n and -N options can be combined to control how processes are distributed over nodes. If neither is specified prun starts two processes.
  • -N  nodes | all  Specifies the number of nodes required. You may also allocate all nodes in a partition using the all argument (i.e. prun -N all). If the number of nodes is not specified then the RMS scheduler will allocate one CPU per process on nodes with free CPUs.
  • -O   Allows resources to be overcommitted. Set this flag if you want to run more than one process per CPU.
  • -p  partition  Specifies the partition on which the program will be executed. By default, the partition specified in th attributes table is used.
  • -r   Run processes using rsh. Used for admin operations such as starting and stopping RMS.
  • -s   Print stats as job exits.
  • -t   Prefix output with the process number.
  • -v   Specifies verbose operation. Multiple -v options increase the level of output, -vv shows each stage in running a program and -vvv enables debug output from the rmsloader processes on each node.

RMS Environment Variables set for prun:

  • RMS_IMMEDIATE
  • RMS_MEMLIMIT
  • RMS_PARTITION
  • RMS_PROJECT
  • RMS_TIMELIMIT
  • RMS_DEBUG
  • RMS_EXITTIMEOUT

RMS Environment Variables set by prun:

  • RMS_JOBID
  • RMS_NNODES
  • RMS_NODEID
  • RMS_NPORCS
  • RMS_RANK
  • RMS_RESOURCEID

RMS Batch Options: rinfo Command:

  • -a   List all resources and jobs (both the user's and those of others).
  • -c   List the configuration names.
  • -h   Display the list of options.
  • -j   List current jobs. This can be combined with the -a option to get a lis of all jobs (both the user's and those of others).
  • -l   Give more detailed information.
  • -m   Show the machine name.
  • -n   Show the status of each node. Can be combined with -1.
  • -p   Identify each active partition by name and indicate the number of CPUs in each partition.
  • -q   Print information on the user's quotas and projects.
  • -r   Show the allocated resources.
  • -L partition statistic  Print the hostname of a lightly loaded node in the machine or the specified partition. RMS provides a load balancing service, accessible through rmsexec, that enables users to run their processes on lightly loaded nodes, where loading is evaluated according to a given statistic.
  • -s daemon | all [hostname]  Show the status of the daemon. When used with the argument all rinfo will show the status of all daemons running on the rmshost management node. For daemons that run on multiple nodes, such as rmsd, the optional hostname argument specifies the hostname of the node on which the daemon is running.
  • -t node | name  Where node is the network ID of a node, rinfo translates it into the hostname; where name is a hostname, rinfo translates it into the network ID.

C/C++ Compilers

Compaq C and C++ compiler version 6.3 are installed on Sierra and are located in the default path.

  • Compaq C:

    • Command line: cc [option] file
    • Useful options:
      • -fast  turns on a collection of optimization flags
      • -omp  compiles OpenMP program
    • More information:
  • Compaq C++:

    • Command line: cxx [option] file
    • Useful options:
      • -fast  turns on a collection of optimization flags
      • -omp  compiles OpenMP program
    • More information:

Fortran

Compaq Fortran for Tru64 UNIX compiler version 5.4 suite is installed on Sierra. This includes support for Fortran 77, 90 and 95. The compilers, f77, f90 and f95 in the default path.

  • Compaq Fortran:

    • Command line: f77, f90, f95 [option] file
    • Useful options:
      • -fast  turns on a collection of optimization flags
      • -omp  compiles OpenMP program
    • More information:

To use MPI

To link to the MPI libraries (located in /usr/lib) on the Compaq Sierra, do the following:

  • For C:

    1. Include MPI header file in your program:

      #include <mpi.h>

    2. Compile linking MPI and ELAN libraries:

      cc -o progname progname.c -lmpi -lelan

  • For Fortran:

    1. Include MPI header file in your program:

      INCLUDE "mpif.h"

    2. Compile linking MPI and ELAN libraries:

      f90 -o progname progname.f -lmpi -lfmpi -lelan

To use Shared Memory

  1. set the environment variable to control the number of processors (e.g. DXML library requires "PARALLEL X" where X is the number of processors. For example, csh users would want to add:

    setenv PARALLEL N

    to their PBS script where N is the number of processes. For more information, on rocky do "man dxml."
  2. In your PBS script specify 1 node and and 1 processor in the prun command:

    prun -N 1 -n 1 a.out

Linear algebra subroutines

Updated June 28, 2004

The Compaq Extended Math Library (CXML) is a set of computationally intensive mathematical subroutines that are optimized for the Alpha platform. It is installed on the Compaq Sierra cluster.

The library subroutines cover areas of:

  • BLAS Level 1,2,3 subroutines
  • complete LAPACK routines
  • sparse linear system solvers (direct sparse and iterative solvers)
  • signal processing routines (FFT, cos/sin transforms, convolution, correlation and digital filters)

Further references

Examples how to link CXML library on Compaq Sierra

Single processor version

  • For C:
    • Command line: cc source_name.c -o executable_name -lcxml
  • For Fortran:
    • Command line: f90 (or f77) source_name.f -o executable_name -lcxml

Parallel version (for Shared Memory Processing - SMP)

  • For C:
    • Command line: cc source_name.c -o executable_name -lcxmlp
  • For Fortran:
    • Command line: f90 (or f77) source_name.f -o executable_name -lcxmlp