How to specify requested resources:
For node sharing on the GPU nodes– see the GPU page.
For node sharing on the non-GPU nodes, node sharing requires that users use the shared
partition for a given set of nodes and that the job explicitly request the number
of cores and the amount of memory that should be allocated to their job. The remaining
cores and memory will then be available for other jobs. The requested number of cores
and amount of memory will be used to set up the “cgroup” for the job, which is the
mechanism used to enforce these limits.
The node sharing partitions are
The number of cores requested must be specified using the ntasks sbatch directive:
will request 2 cores
The amount of memory requested can be specified with the memory batch directive.
This can also be specified in MB (which is the assumed unit if none is specified):
If there is no memory directive used the default is that 2G/core will be allocated to the job.
Node sharing automatically sets task to CPU core affinity, allowing the job to run only on as many cores as there are requested tasks. To find out what cores was the job pinned to, run
Note that since we have CPU hyperthreading on (allowing two logical cores per one
physical core), this command will report a pair of logical cores for each physical
core, e.g. (2,30) corresponds to core number 3 (numbering starts from 0) on a 28 core
node, and its associated hypercore. Node core numbering from the system perspective
is obtained by running
Potential implication on performance:
Despite the task affinity, the performance of any job run in a shared manner can be impacted by other jobs run on the same node due to shared infrastructure being used, in particular the I/O to storage and the shared communication paths between the memory and cpus. If you are doing benchmarking using only a portion of a node, you should not use node sharing but instead request the entire node.
Implication on amount of allocation used:
The usage of a shared job is based on the percentage of the cores and the memory used, whichever is higher. This can, in the case of owner resources where the quarterly allocation listed at https://www.chpc.utah.edu/usage/cluster/current-project-general.php is based on the number of cores and the number of hours in the quarter, lead to scenarios where more core hours than are available for a given quarter are used. Note that this will not cause any issues, but that it can lead instances where the usage is greater than the allocated amount and therefore a negative balance will show.
As an example, consider a node with 24 cores and 128GB. Without node sharing, during any 1 hour period the maximum usage would be 24 core hours. However, with node sharing one could envision that a serial job which requires a lot of memory requests 1 core and 64GB of memory along with 23 other single core jobs with memory requests that total up to fit within the remaining 64GB (say each wants the default 2G/core). If all jobs run simultaneously for 1 hour, a total of 35 core hours of allocation will be used in this hour (1 core hour for each of the 23 small memory jobs, and 12 core hours for the 64GB memory job, based on ½ of the memory of the node being used).