CHPC now has the usage accounting structure in place to allow multiple batch jobs to share a single node. We have been using the node-sharing feature of slurm since the addition of the GPU nodes to kingspeak, as it is typically most efficient to run 1 job per GPU on nodes with multiple GPUs.
More recently, we have offered node sharing to select owner groups for testing, and based on that experience we are making node sharing availalbe for any group that owns nodes. Note that at this time we do not anticipate adding node sharing to either the general resources or to guest access to the owner nodes.
If your group owns nodes and would like to be set up to use node sharing, please contact chpc via email@example.com.
NOTE: July 1, 2019: Node sharing has been enabled on all compute nodes in the general CHPC computing environment (clusters of notchpeak, kingspeak, ember, lonepeak, tangent and ash).
How to specify requested resources:
For node sharing on the GPU nodes– see the GPU page.
For node sharing on the non-GPU nodes, node sharing requires that users use the shared
partition for a given set of nodes and that the job explicitly request the number
of cores and the amount of memory that should be allocated to their job. The remaining
cores and memory will then be available for other jobs. The requested number of cores
and amount of memory will be used to set up the “cgroup” for the job, which is the
mechanism used to enforce these limits.
The node sharing partitions are:
- For general nodes of a cluster
- For guest access on owner nodes of a cluster
- For owner nodes
In addition, on notchpeak there are two nodes (AMD Epyc processors, 64 cores, 512 GB memory) reserved for short jobs, which can only be used in a shared manner. To use these nodes, both the account and partition should be set to notchpeak-shared-short. See the Notchpeak Cluster Guide for more information.
The number of cores requested must be specified using the ntasks sbatch directive:
will request 2 cores
The amount of memory requested can be specified with the memory batch directive.
This can also be specified in MB (which is the assumed unit if none is specified):
If there is no memory directive used the default is that 2G/core will be allocated to the job.
With node sharing, when doing a
sinfo , you will notice that there is an additional state for jobs that are partially
Node sharing automatically sets task to CPU core affinity, allowing the job to run only on as many cores as there are requested tasks. To find out what cores was the job pinned to, run
Note that since we have CPU hyperthreading on (allowing two logical cores per one
physical core), this command will report a pair of logical cores for each physical
core, e.g. (2,30) corresponds to core number 3 (numbering starts from 0) on a 28 core
node, and its associated hypercore. Node core numbering from the system perspective
is obtained by running
Potential implication on performance:
Despite the task affinity, the performance of any job run in a shared manner can be impacted by other jobs run on the same node due to shared infrastructure being used, in particular the I/O to storage and the shared communication paths between the memory and cpus. If you are doing benchmarking using only a portion of a node, you should not use node sharing but instead request the entire node.
Implication on amount of allocation used:
The usage of a shared job is based on the percentage of the cores and the memory used, whichever is higher. This can, in the case of owner resources where the quarterly allocation listed at https://www.chpc.utah.edu/usage/cluster/current-project-general.php is based on the number of cores and the number of hours in the quarter, lead to scenarios where more core hours than are available for a given quarter are used. Note that this will not cause any issues, but that it can lead instances where the usage is greater than the allocated amount and therefore a negative balance will show.
As an example, consider a node with 24 cores and 128 GB. Without node sharing, during any 1 hour period the maximum usage would be 24 core hours. However, with node sharing one could envision that a serial job which requires a lot of memory requests 1 core and 96 GB of memory along with a second job requesting 20 cores and the remaining 32 GB of memory. As all of the memory has been allocated, the remaining 3 cores will stay idle. If both jobs run simultaneously for 1 hour, a total of 38 core hours of allocation will be used in this hour (18 core hours for the 96 GB memory job, based on 3/4 of the memory of the node being used, along with 20 core hours for the 20 core job).