Sierra Cluster User Guide
Contents
CHPC resources are available to qualified faculty, students (under faculty supervision), and researchers from any Utah institution of higher education. Users can request accounts for CHPC computer systems by filling out an account request form. This can be found by following the link below or by coming into Room 405, INSCC Building. (Phone 581-5253)
- Web version: account request form.
- Hardcopy: from our main office, 405 INSCC, 581-5253.
Users requiring more than the default service units (SU) per quarter need to send a brief proposal, using the the allocation form available either:
- Web version: Allocation form.
- Hardcopy: from our main office, 405 INSCC, 581-5253.
Users can access the Sierra interactive node via ssh (secure shell) at the following address:
rocky.chpc.utah.edu
Problems regarding network connectivity and other questions about the Sierra's availability should be addressed to the CHPC's consulting staff. Please refer to user services for assistance.
Interactive Work Policy: The CHPC systems are designed so that users doing interactive work have an acceptable response time, and users doing batch work have acceptable performance. Interactive work includes editing, compiling, debugging, and checking results. Batch processing includes any run that requires more than 15 minutes of CPU time. A question that provides a rule-of-thumb for interactive/batch processing is, "Will my job affect others' response time?" If the answer is yes, then it is a batch job.
Users who abuse CHPC policies will lose their accounts.
The batch implementation on all CHPC systems includes PBS, a resource manager, and a scheduler. The scheduler on the Compaq Sierra was written at the Pittsburgh Supercomputing Center.
Any MPI job or process which runs for more that 15 minutes will need to be run through the batch system.
There are three steps to running in batch:
Example PBS Script for the Compaq Sierra
Note that shell programming is exactly like running commands in the shell. You simply write into the file the commands you would like to run the same way you would write them interactively.
The following is an example script for running in PBS on
the Compaq Sierra. The lines at top of the file all begin with
#PBS which are seen as comments to the shell, but
give options to PBS.
In this example the job would be run on 1 node with 4
processors and 1 hour walltime. CHPC's policy on this is that
you must use multiples of 4 processors on your jobs. Since we
currently have 24 processors available in the production batch
pool, the only valid specifications for the "-l"
line are rmsnodes=1:4, rmsnodes=2:8, rmsnodes=3:12,
rmsnodes=4:16, rmsnodes=5:20 and
rmsnodes=6:24.
You will need to change the email address specification and
the "path" in your home directory to copy files from and to (2
places). You will also need to decide which
"prun" command. The example script is setup for
an MPI job. If your program uses shared memory (including
gaussian) you will want to comment out the MPI prun command
and take out the comment in the shared memory
prun line.
#PBS -S /bin/csh
#PBS -q sierra
#PBS -l rmsnodes=1:4,walltime=1:00:00
#PBS -m abe
#PBS -M youremail@address
#PBS -j oe
#Create execution directory
if (! -d /scratch/global/$PBS_JOBID) then
echo "making /scratch/global/$PBS_JOBID"
mkdir /scratch/global/$PBS_JOBID
#Copy the necessary files to the execution dir.
cp ~/path/a.out /scratch/global/$PBS_JOBID
#Change to execution directory
cd /scratch/global/$PBS_JOBID
##Run your job. Note $RMS_NODES and $RMS_PROCS is set by
##your specification in the #PBS -l line.
##Use one of the following prun commands
##For MPI jobs use:
prun -N $RMS_NODES -n $RMS_PROCS ./a.out > outputfile
##For Shared Memory (including gaussian) use:
#prun -N $RMS_NODES -n 1 ./a.out > outputfile
#Copy files back home and clean up.
rm ./a.out
cp * ~/path && cd .. && /bin/rm -rf /scratch/global/$PBS_JOBID
Queues on the Sierra Cluster
There are four queues available on the Sierra Cluster (3/26/02):
- sierra - For regular jobs. Will not run on sierra2 or sierra3 which are reserved for queues "par" and "short".
- test - for testing scripts - 1 node, 4 procs less than one hour.
- par - for 2-way parallel jobs
(
rmsnodes=2:8) and can run on any nodes, but will have priority on nodes sierra2 and sierra3. - short - for 1-way or 2-way parallel jobs taking less than 12 hours. Can run on any nodes.
For example, if you wish to test your scripts and do short
test runs you change the "-q" line to "#PBS
-q test" in your script which will limit you to
rmsnodes=1:4 and a walltime less than 1 hour.
See below for the prun batch options
to run parallel programs on the Compaq Sierra.
Job Submission on the Compaq Sierra
Submit the job using the "qsub" PBS command.
See the PBS commands below for additional
PBS commands.
For example, to submit a script file named
"pbsjob", type
qsub pbsjob
PBS sets and expects a number of variables in a PBS script. For information on these variables and necessities, enter:
man qsub
Checking Job Status
To check if your job is queued or running, use the "qstat"
PBS command:
qstat
Another option is to use the "rms" command
rinfo.
rinfo
Setting Up Your Environment for Batch
PBS will fail if you have tty-dependant commands in your
.profile, .cshrc, .login or .logout. One means of preserving
your login defaults and avoiding problems with PBS is by
checking to see if you are in PBS before you execute a
tty-dependant command. It is easy to check to see if you are
in PBS by seeing if the environment variable PBS_ENVIRONMENT
is set, and execute your commands appropriately.
Bourne and Korn shell users can accomplish this by
modifying the .profile with the following:
if [-z "$PBS_ENVIRONMENT"]
then
# do interactive commands
stty erase '^H'
export term=xterm
else
# do batch specific commands
endif
While csh users can us the following and their
.cshrc and/or .login:
if ( $?PBS_ENVIRONMENT ) then
# do batch specific commands
else
# do interactive commands
stty erase '^H'
set term xterm
endif
If you run csh and you have a ~/.logout script, you should
place the following at the first and last of the file.
#First of csh ~/.logout set EXITVAL = $status #Last line of csh ~/.logout exit $EXITVAL
PBS Batch Script Options
- -a date_time.
Declares the time after which the job is eligible for
execution. The date_time element is in the
form:
[[[[CC]YY]MM]DD]hhmm[.S]. - -e path.
Defines the path to be used for the standard error
stream of the batch job. The path is of the
form:
[hostname:]path_name. - -h. Specifies that a user hold will be applied to the job at submission time.
- -I. Declares that the job is
to be run "interactively". The job will be queued and
scheduled as PBS batch job, but when executed the
standard input, output, and error streams of the job
will be connected through
qsubto the terminal session in whichqsubis running. - -j join. Declares if
the standard error stream of the job will be merged
with the standard ouput stream. The join
argument is one of the following:
- oe- Directs the two streams as standard output.
- eo- Directs the two streams as standard error.
- n- Any two streams will be separate(Default).
- -l resource_list. Defines the resources that are required by the job and establishes a limit on the amount of resources that can be consumed. Users will want to specify the walltime resource, and if they wish to run a parallel job, the ncpus resource.
- -m mail_options.
Conditions under which the server will send a mail
message about the job. The options are:
- n: No mail ever sent
- a (default): When the job aborts
- b: When the job begins
- e: When the job ends
- -M user_list. Declares the list of e-mail addresses to whom mail is sent. If unspecified it defaults to userid@host from where the job was submitted. You will most likely want to set this option.
- -N name. Declares a name for the job.
- -o path. Defines the
path to be used for the standard output.
[hostname:]path_name. - -q destination. The destination is the queue.
- -S path_list. Declares the shell that interprets the job script. If not specified it will use the user's login shell.
- -v variable_list.
Expands the list of environment variables which are
exported to the job. The variable list is a
comma-separated list of strings of the form
variableorvariable=value. - -V. Declares that all environment variables in the qsub command's environment are to be exported to the batch job.
PBS User Commands
For any of the commands listed below you may do a
"man command" for syntax and detailed
information.
Frequently used PBS user commands:
- qsub. Submits a job to the PBS queuing system.
Please see
qsubOptions below. - qdel. Deletes a PBS job from the queue.
- qstat. Shows status of PBS batch jobs.
- xpbs. X interface for PBS users.
Less Frequently-Used PBS User Commands:
- qalter. Modifies the attributes of a job.
- qhold. Requests that the PBS server place a hold on a job.
- qmove. Removes a job from the queue in which it resides and places the job in another queue.
- qmsg. Sends a message to a PBS batch job. To send a message to a job is to write a message string into one or more of the job's output files.
- qorder. Exchanges the order of two PBS batch jobs within a queue.
- qrerun. Reruns a PBS batch job.
- qrls. Releases a hold on a PBS batch job.
- qselect. Lists the job identifier of those jobs which meet certain selection criteria.
- qsig. Requests that a signal be sent to the session leader of a batch job.
RMS User Commands
For any of the commands listed below you may do a "man
command" for syntax and detailed information.
The RMS user commands are:
- prun. Loads and runs parallel programs. It can also run multiple copies of a sequential program.
- rinfo. Displays information about the resources available and about the jobs which are running.
RMS Batch Options: prun Command:
- -B basenode Specifies the number of the base node (the first to use) in the partitiion. Numbering within the partitiion starts at 0. By default the base node is unassigned, leaving the scheduler free to selct nodes that are not in use.
- -c CPUs Specifies the number of CPUs required per process (default 1).
- -h Display the list of options.
- -i Allocate CPUs immediately or fail. By default, prun blocks until resources become available.
- -m block | cyclic Specifies whether to use block (the default) or cyclic distribution of processes over nodes.
- -n processes Specifies the number of processes required. The -n and -N options can be combined to control how processes are distributed over nodes. If neither is specified prun starts two processes.
- -N nodes | all Specifies the number of nodes required. You may also allocate all nodes in a partition using the all argument (i.e. prun -N all). If the number of nodes is not specified then the RMS scheduler will allocate one CPU per process on nodes with free CPUs.
- -O Allows resources to be overcommitted. Set this flag if you want to run more than one process per CPU.
- -p partition Specifies the partition on which the program will be executed. By default, the partition specified in th attributes table is used.
- -r Run processes using rsh. Used for admin operations such as starting and stopping RMS.
- -s Print stats as job exits.
- -t Prefix output with the process number.
- -v Specifies verbose operation. Multiple -v options increase the level of output, -vv shows each stage in running a program and -vvv enables debug output from the rmsloader processes on each node.
RMS Environment Variables set for prun:
- RMS_IMMEDIATE
- RMS_MEMLIMIT
- RMS_PARTITION
- RMS_PROJECT
- RMS_TIMELIMIT
- RMS_DEBUG
- RMS_EXITTIMEOUT
RMS Environment Variables set by prun:
- RMS_JOBID
- RMS_NNODES
- RMS_NODEID
- RMS_NPORCS
- RMS_RANK
- RMS_RESOURCEID
RMS Batch Options: rinfo Command:
- -a List all resources and jobs (both the user's and those of others).
- -c List the configuration names.
- -h Display the list of options.
- -j List current jobs. This can be combined with the -a option to get a lis of all jobs (both the user's and those of others).
- -l Give more detailed information.
- -m Show the machine name.
- -n Show the status of each node. Can be combined with -1.
- -p Identify each active partition by name and indicate the number of CPUs in each partition.
- -q Print information on the user's quotas and projects.
- -r Show the allocated resources.
- -L partition statistic Print the hostname of a lightly loaded node in the machine or the specified partition. RMS provides a load balancing service, accessible through rmsexec, that enables users to run their processes on lightly loaded nodes, where loading is evaluated according to a given statistic.
- -s daemon | all [hostname] Show the status of the daemon. When used with the argument all rinfo will show the status of all daemons running on the rmshost management node. For daemons that run on multiple nodes, such as rmsd, the optional hostname argument specifies the hostname of the node on which the daemon is running.
- -t node | name Where node is the network ID of a node, rinfo translates it into the hostname; where name is a hostname, rinfo translates it into the network ID.
C/C++ Compilers
Compaq C and C++ compiler version 6.3 are installed on Sierra and are located in the default path.
Compaq C:
- Command line:
cc [option] file - Useful options:
- -fast turns on a collection of optimization flags
- -omp compiles OpenMP program
- More information:
man cc- Manuals (Compaq's webpage)
- Command line:
Compaq C++:
- Command line:
cxx [option] file - Useful options:
- -fast turns on a collection of optimization flags
- -omp compiles OpenMP program
- More information:
man cxx- Manuals (Compaq's webpage)
- Command line:
Fortran
Compaq Fortran for Tru64 UNIX compiler version 5.4 suite is installed on Sierra. This includes support for Fortran 77, 90 and 95. The compilers, f77, f90 and f95 in the default path.
Compaq Fortran:
- Command line:
f77, f90, f95 [option] file - Useful options:
- -fast turns on a collection of optimization flags
- -omp compiles OpenMP program
- More information:
man f77, f90, f95- Manuals (Compaq's webpage)
- Command line:
To use MPI
To link to the MPI libraries (located in /usr/lib) on the Compaq Sierra, do the following:
For C:
- Include MPI header file in your program:
#include <mpi.h> - Compile linking MPI and ELAN libraries:
cc -o progname progname.c -lmpi -lelan
- Include MPI header file in your program:
For Fortran:
- Include MPI header file in your program:
INCLUDE "mpif.h" - Compile linking MPI and ELAN libraries:
f90 -o progname progname.f -lmpi -lfmpi -lelan
- Include MPI header file in your program:
To use Shared Memory
- set the environment variable to control the number of
processors (e.g. DXML library requires "
PARALLEL X" where X is the number of processors. For example, csh users would want to add:
to their PBS script where N is the number of processes. For more information, on rocky do "setenv PARALLEL Nman dxml." - In your PBS script specify 1 node and and 1 processor in
the prun command:
prun -N 1 -n 1 a.out
Linear algebra subroutines
Updated June 28, 2004
The Compaq Extended Math Library (CXML) is a set of computationally intensive mathematical subroutines that are optimized for the Alpha platform. It is installed on the Compaq Sierra cluster.
The library subroutines cover areas of:
- BLAS Level 1,2,3 subroutines
- complete LAPACK routines
- sparse linear system solvers (direct sparse and iterative solvers)
- signal processing routines (FFT, cos/sin transforms, convolution, correlation and digital filters)
Further references
- Serial CXML routine reference and user's manual (Compaq's webpage)
- Parallel CXML documentation (Compaq's webpage)
Examples how to link CXML library on Compaq Sierra
Single processor version
For C:
- Command line:
cc source_name.c -o executable_name -lcxml
- Command line:
For Fortran:
- Command line:
f90 (or f77) source_name.f -o executable_name -lcxml
- Command line:
Parallel version (for Shared Memory Processing - SMP)
For C:
- Command line:
cc source_name.c -o executable_name -lcxmlp
- Command line:
For Fortran:
- Command line:
f90 (or f77) source_name.f -o executable_name -lcxmlp
- Command line:

