You are here:

Node Sharing

CHPC now has the usage accounting structure in place to allow multiple batch jobs to share a single node.  We have been using the node-sharing feature of slurm since the addition of the GPU nodes to kingspeak, as it is typically most efficient to run 1 job per GPU on nodes with multiple GPUs.

More recently, we have offered node sharing to select owner groups for testing, and based on that experience we are making node sharing availalbe for any group that owns nodes.  Note that at this time we do not anticipate adding node sharing to either the general resources or to guest access to the owner nodes.

If your group owns nodes and would like to be set up to use node sharing, please contact chpc via issues@chpc.utah.edu.

How to specify requested resources:

For node sharing on the GPU nodes– see the GPU page

For node sharing on the non-GPU nodes, node sharing requires that users use the shared partition for a given set of nodes and that the job explicitly request the number of cores and the amount of memory that should be allocated to their job.  The remaining cores and memory will then be available for other jobs. The requested number of cores and amount of memory will be used to set up the “cgroup” for the job, which is the mechanism used to enforce these limits.
The node sharing partitions are

#SBSATCH --partition=partitionname-shared-kp

The number of cores requested must be specified using the ntasks sbatch directive:

 #SBATCH --ntasks=2

will request 2 cores

The amount of memory requested can be specified with the memory batch directive.

#SBATCH --mem=32G

This can also be specified in MB (which is the assumed unit if none is specified):

#SBATCH --mem=32000

If there is no memory directive used the default is that 2G/core will be allocated to the job.

Task affinity

Node sharing automatically sets task to CPU core affinity, allowing the job to run only on as many cores as there are requested tasks. To find out what cores was the job pinned to, run

cat /cgroup/cpuset/slurm/uid_$SLURM_JOB_UID/job_$SLURM_JOB_ID/cpuset.cpus

Note that since we have CPU hyperthreading on (allowing two logical cores per one physical core), this command will report a pair of logical cores for each physical core, e.g. (2,30) corresponds to core number 3 (numbering starts from 0) on a 28 core node, and its associated hypercore. Node core numbering from the system perspective is obtained by running numactl -H.

Potential implication on performance:

Despite the task affinity, the performance of any job run in a shared manner can be impacted by other jobs run on the same node due to shared infrastructure being used, in particular the I/O to storage and the shared communication paths between the memory and cpus. If you are doing benchmarking using only a portion of a node, you should not use node sharing but instead request the entire node.

Implication on amount of allocation used:

The usage of a shared job is based on the percentage of the cores and the memory used, whichever is higher. This can, in the case of owner resources where the quarterly allocation listed at https://www.chpc.utah.edu/usage/cluster/current-project-general.php is based on the number of cores and the number of hours in the quarter, lead to scenarios where more core hours than are available for a given quarter are used. Note that this will not cause any issues, but that it can lead instances where the usage is greater than the allocated amount and therefore a negative balance will show.    

As an example, consider a node with 24 cores and 128GB.  Without node sharing, during any 1 hour period the maximum usage would be 24 core hours. However, with node sharing one could envision that a serial job which requires a lot of memory requests 1 core and 64GB of memory along with 23 other single core jobs with memory requests that total up to fit within the remaining 64GB (say each wants the default 2G/core).  If all jobs run simultaneously for 1 hour, a total of 35 core hours of allocation will be used in this hour (1 core hour for each of the 23 small memory jobs, and 12 core hours for the 64GB memory job, based on ½ of the memory of the node being used).

Last Updated: 10/18/18