Tangent User Guide
The tangent cluster was retired March 31, 2021.
Background on the Apt Project
In Fall 2013, NSF funded a collaboration led by Rob Ricci of the School of Computing’s Flux group, in conjunction with the Center for High Performance Computing, to develop an “adaptable profile-driven testbed”. This testbed will allow computer research teams to use the same resources for different missions, e. g., network experiments, high performance computing experiments, security experiments. The Flux group’s emphasis is to create a low barrier manner of creating very reproducible experiments. CHPC’s emphasis is to create an environment to roll out HPC images on demand, to scale the images dynamically, and to support multiple images with different HPC and security contexts.
The Apt project is a three year project which consists of a hardware foundation, the Apt cluster, and a testbed control system, built upon systems developed previously by the Flux group for the Emulab and GENI projects. The testbed control system will allow researchers to use either established images, or to create new ones for their experiments. Researchers can use one or more of these images simultaneously, along with respective network characterizations, to create a “profile”, which can be saved and used to repeat experiments or to share with other researchers.
The Flux Research Group started building Emulab in 1999 as a testbed for their own research in operating systems and distributed systems, and subsequently made this tool available to others. Emulab has grown to have about 5,000 users worldwide and there are also about 50 other sites that have built testbeds using the open-source software. As part of NSF's GENI project, the Flux group focused on expanding the scope of the Emulab to federate with other types of testbeds. Apt represents a different type of widening of the scope: expanding the environment to HPC as well as to other areas of computer science (CS) through Apt's "on demand" profiles.
Software is being developed to allow this hardware to be dynamically provisioned to meet the needs of the researchers. The user defines a profile which includes all the information needed to run an experiment, including the description of the resources, both hardware and software, that will be used in the experiment, providing a mechanism to enable repeatable research.
The hardware specification of the profile includes information on the properties of the nodes, the storage, and the network. The software environment of the profile consists of the operation system, and can include additional software packages, data files, etc., needed for the experiment. The Apt project includes a number of standard profiles; others will be defined and shared by users of the resource.
Researchers use either an application programming interface (API) or a web interface to configure the profile, then create an image of this profile, and define the experiment. The experiment belongs to the experimenter for the specified duration. When the experiment is complete, the Apt software de-provisions the hardware, and makes it available for future requests.
For more information on the project see http://www.flux.utah.edu/project/apt.
CHPC is establishing an traditional HPC profile as a cluster called Tangent, to launch jobs on the C6220 nodes, which are the same hardware as the 16 core nodes of CHPC’s Kingspeak cluster. This profile will have CHPC applications available, as well as mount current CHPC file systems. From the user perspective, access to this resource is obtained via a login to an interactive node for the Tangent cluster. The Tangent interactive nodes are local to CHPC and allow users the users to submit batch jobs that will spin up dynamic HPC images on the Apt hardware.
Apt Cluster Hardware (General) Overview
- 128 Dell PowerEdge r320 nodes, with a single Intel Xeon E5-2450 processor (8 cores, 2.1Ghz), 16GB Memory, and four 500GB Hard Drives
- 64 Dell PowerEdge c6220 nodes, with dual Intel Xeon E5-2650v2 processors (8 cores, 2.6Ghz), for a total of 16 cores, 64GB Memory, and two 1TB Hard Drives
NOTE: Currently, Tangent is only using the 64 c6220 nodes.
NFS Home Directory
Your home directory, which is an NFS mounted file system, is one choice for I/O. This space carries the worst statistical performance in terms of I/O speed. This space is visible to all nodes on the clusters through an auto-mounting system.
NFS Scratch (
Tangent has access to another NFS filesystem:
/scratch/kingspeak/serial. This file system has 175 TB disk capacity. It is attached to the Infiniband network
to obtain a larger potential network bandwidth. This space is seen (read and write
access) on all Tangent interactive and compute nodes. However it is still a shared
resource and may therefore perform slower when it is subjected to significant user
load. Users should test their applications performance to see if they experience any
unexpected performance issues within this space. This file system has a scrub policy
of files older than 60 days being deleted.
Parallel Scratch (
Tangent also has access to a seond scratch file system :
/scratch/general/lustre. This file system has a capacity of 700 TB. This space is seen (read and write access) on all Tangent interactive and compute nodes.
However it is still a shared resource and may therefore perform slower when it is
subjected to significant user load. Users should test their applications performance
to see if they experience any unexpected performance issues within this space. This
file system has a scrub policy of files older than 60 days being deleted.
Local Disk (
The local scratch space is a storage space unique to each individual node. As the hardware is deprovsioned after every job, no data left on this space can be retrieved after the experiment has been completed.
On Tangent the nodes that are available to run jobs at any instance is variable, depending on the other users of the Apt hardware. To see the nodes available you would do a sinfo. Below is a typical output of this command:
]$ sinfo -l
Mon Nov 17 11:17:01 2014
PARTITION AVAIL TIMELIMIT JOB_SIZE ROOT SHARE GROUPS NODES STATE NODELIST
tangent* up 3-00:00:00 1-64 no NO all 2 draining* tp[048-049]
tangent* up 3-00:00:00 1-64 no NO all 3 allocated# tp[010,024-025]
tangent* up 3-00:00:00 1-64 no NO all 21 drained tp[008,012,018-020,023,027-030,033-037,045-047,050,057,059]
tangent* up 3-00:00:00 1-64 no NO all 33 idle~ tp[001-007,011,013-017,026,038-044,051-056,058,060-064]
tangent* up 3-00:00:00 1-64 no NO all 5 allocated tp[009,021-022,031-032]
Possible states of importance for tangent allocated, completing, down, drained, draining
, fail, failing, idle, maint, mixed, perfctrs, power_down, power_up, reserved, and
unknown plus Their abbreviated forms: alloc, comp, down, drain, drng, fail, failg,
futr, idle, maint, mix, npc, pow_dn, pow_up, resv, and unk respectively. Note that
the suffix "*" identifies nodes that are presently not responding, the suffix "#"
indicates that the node is being powered up and provisioned, and the suffix "~" indicates
that the nodes are in the powered-down state. See the end of "man sinfo" page for
meaning of each of the states.
In this particular example, we see that 33 nodes are idle, i.e. powered down, thus available for a new job. The drained usually are the nodes that are used by other Apt experiments outside of Tangent. Allocated, and possibly also drained, generally describes nodes used by Tangent jobs
The squeue command lists active queue on the Tangent (= subset of Apt currently used for HPC jobs). For the sinfo example listed above, the corresponding squeue output is:
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
373 tangent 74836 u0101881 PD 0:00 52 (Resources)
396 tangent g09.slur u0028729 R 1:43 2 tp[024-025]
397 tangent tcsh u0101881 CF 0:02 2 tp[046-047]
395 tangent amber14. u0028729 CG 0:00 2 tp[048-049]
394 tangent nwchem.s u0028729 R 8:03 2 tp[009-010]
393 tangent g09.slur u0028729 R 8:13 2 tp[031-032]
392 tangent tcsh u0101881 R 14:55 2 tp[021-022]
The ST describes the state in which the job is. See man squeue for details on each states, in our case R stands for running job, CF stands for configuring (provisioning, startup) of the job, PD means that the job is pending due to insufficient available resources, CG means completing (finalizing) the job.
NOTE – we will add to this section as we get questions from users
- My job is taking a long time to start.
Due to the fact that the nodes have to be requested from the Apt framework, booted
and configured at each job start, the job startup takes longer than users may be used
to from other clusters. It is not unusual for larger jobs (> 12 nodes), to take 15-30
minutes to start. Also, occasionally there may be a problem with what nodes are made
available to the job by the Apt, in which case the whole job needs to restart which
further increases the time the job starts. We recommend to monitor the job startup
with the squeue command along with monitoring the I/O output of the calculation in
whatever directory it's being run. If the job is in the running state (R) but no output
has been written for a while, it is possible that one of the nodes is in a bad state
thus blocking the whole job. If this happens, you'll have to delete the job (scancel)
and start it again.
In order to set up a HPC job on the Apt hardware, a user will access via the tangent cluster interactve node
All CHPC machines mount the same user home directories. This means that the user files on Tangent will be exactly the same as the ones on other CHPC clusters. The advantage is obvious: users do not need to copy files between machines. However, users must be aware that they run the correct executables. CHPC maintained applications with executables suitable for use on all clusters are kept in /uufs/chpc.utah.edu/sys/pkg, whereas cluster specific executables (MPI-based applications built using the mpi optimized for the specific cluster infiniband) for Tangent will be found in /uufs/tangent.peaks/sys/pkg.
The batch implementation on this system is Slurm – Simple Linux Utility for Resource Management
Information about Slurm is available at: http://slurm.schedmd.com/documentation.html . To assist users in making the transition to Slurm, CHPC has developed a Slurm documentation page, which includes common slurm commands and variables, a sample slurm batch script, and a translation guide for common commands and environmental variables of the two batch systems.
There is a hard limit of maximum 72 hours for tangent jobs.
Runs in the batch system generally pass though the following steps:
The creation of a batch script on the Tangent cluster
In order to submit a job on Tangent one has to first login to the tangent interactive
node. Then the job submission is done with the
sbatch command in slurm.
For example, to submit a script named pbsjob, just type:
Checking the status of your job
Typical job options such as requested number of nodes/processors, walltime, etc are available by Slurm. Below is a sample job script:
#SBATCH -N 4
#SBATCH -n 8
#SBATCH -J my_job
#SBATCH -o my_job_output
cp -r $DATADIR/* $SCRDIR
# a way to get the list of hosts allocated to the job
srun hostname -s | sort -u > nodefile.$SLURM_JOBID
mpirun -genv OMP_NUM_THREADS 8 -np 8 $EXE
The #SBATCH option denotes the SLURM flags. Note that the node/process requesting
is different from Moab in such a sense that we request number of nodes with
-N flag and number of tasks with
-n flag. The system will allocate
-N nodes with
-n/-N tasks per node. In this example, we are requesting 4 nodes and 8 tasks, so, we'll
be running 2 MPI processes per node. To fully utilize the 16 cores per node, we'll
be running 8 OpenMP threads per process.
Also note that we are using the MPICH2 distribution from the generic CHPC program
branch - the version 3.1.2 has been built with InfiniBand network support that is
appropriate for the Tangent cluster. In case you need to run using the Ethernet network,
MPICH_NEMESIS_NETMOD=tcp to the mpirun flags.
Finally, MPICH2 supports SLURM process to task mapping, which allows us to run without
-machinefile flag that tells mpirun to run on hosts listed in the host file. If the host file
is needed, the
srun hostnameline in the script above produces it.
Interactive jobs are best started with the srun command. For example, to get 2 nodes with 4 tasks:
srun --pty -t 1:00:00 --partition=tangent --account=youraccount -n 4 -N 2 /bin/tcsh
The important flags are the
--pty denoting to get an interactive terminal, and
/bin/tcsh -l, which is the shell to run. If you prefer bash, replace
Slurm User Commands
Please see the slurm documentation page for more information.
For information on application development and compiling on the clusters at CHPC, please see our Programming Guide.