You are here:

Node Sharing Live  on All General Environment Clusters

Posted: July 1, 2019

We have enabled the use of node sharing on all CHPC compute nodes in the general  environment (all compute nodes on notchpeak, kingspeak, ember, lonepeak, tangent and ash).  Previous to this change, node sharing was an option for select owner node partitions (upon request of that group), as wekk as already being in place as the only option on both the gpu nodes and the two AMD notchpeak nodes (notch081 and notch082).

Node sharing provides the option for a user to submit a job that uses only a portion of a node, which is useful if your application cannot make use of all of the processors of a node.  Note when running a job In the shared mode, you MUST specify the number of tasks as well as the amount of memory your job requires. Documentation on slurm, node sharing, and methods to run multiple jobs inside a single batch script can be found at:

https://www.chpc.utah.edu/documentation/software/slurm.php

https://www.chpc.utah.edu/documentation/software/node-sharing.php

https://www.chpc.utah.edu/documentation/software/serial-jobs.php

There is also a summary of node sharing in the Summer 2019 newsletter, available at:

https://www.chpc.utah.edu/news/newsletters/summer2019_newsletter.pdf

When doing a sinfo or si (if you have the alias for this provided on the CHPC slurm documentation page) you will now see each set of nodes listed multiple time -- for general nodes on an a cluster under allocation this will be the cluster, cluster-shared, cluster-freecycle, and cluster-shared-freecycle; if not under allocation the freecycle options are not there. In addition, there is a new node state “mix” which refers to a node partially allocated, whereas nodes that are completely allocated show the state “alloc”, regardless if the node is running a single job or multiple jobs via node sharing.  An example of this is (only the first owner partition is shown):

]$ sinfo  
PARTITION                                                     AVAIL  TIMELIMIT  NODES           STATE               NODELIST
Kingspeak*                                                    up          3-00:00:00     48              alloc               kp[001-032,110-111,158-167,196-199]
kingspeak-shared                                        up          3-00:00:00     48              alloc               kp[001-032,110-111,158-167,196-199]
kingspeak-gpu                                              up          3-00:00:00      4                mix               kp[297-300]
kingspeak-freecycle                                     up          3-00:00:00      4                mix               kp[297-300]
kingspeak-freecycle                                     up          3-00:00:00     48              alloc               kp[001-032,110-111,158-167,196-199]
kingspeak-shared-freecycle                       up          3-00:00:00      4                mix               kp[297-300]
kingspeak-shared-freecycle                       up          3-00:00:00     48              alloc               kp[001-032,110-111,158-167,196-199]
kingspeak-guest                                           up          3-00:00:00      1                drain$               kp145
kingspeak-guest                                           up          3-00:00:00      3                down$  kp[144,146-147]
kingspeak-guest                                           up          3-00:00:00      1                comp               kp334
kingspeak-guest                                           up          3-00:00:00      1                mix               kp257
kingspeak-guest                                           up          3-00:00:00    193             alloc               kp[033-035,037-099,106-108,112-115,117-120,122-143,148-157,228-237,246-256,258-259,261-264,266-274,276,278-280,293-296,301-305,308-309,311,318-323,327-332,345-347,353-356,358,363-367,378,380-381,384-387]
kingspeak-guest                                           up          3-00:00:00     66              idle               kp[036,101-105,116,121,260,265,275,277,281-292,306-307,310,312-317,324-326,333,335-344,348-352,357,368-377,379,382-383]
kingspeak-shared-guest                              up          3-00:00:00      1                drain$               kp145
kingspeak-shared-guest                              up          3-00:00:00      3                down$  kp[144,146-147]
kingspeak-shared-guest                              up          3-00:00:00      1                comp               kp334
kingspeak-shared-guest                               up         3-00:00:00      1                 mix               kp257
kingspeak-shared-guest                              up          3-00:00:00    193             alloc               kp[033-035,037-099,106-108,112-115,117-120,122-143,148-157,228-237,246-256,258-259,261-264,266-274,276,278-280,293-296,301-305,308-309,311,318-323,327-332,345-347,353-356,358,363-367,378,380-381,384-387]
kingspeak-shared-guest                              up          3-00:00:00     66              idle               kp[036,101-105,116,121,260,265,275,277,281-292,306-307,310,312-317,324-326,333,335-344,348-352,357,368-377,379,382-383]
kingspeak-gpu-guest                                   up          3-00:00:00      4                mix               kp[359-362]
lin-kp                                                              up          14-00:00:0     14              alloc               kp[033-035,037-047]
lin-kp                                                              up          14-00:00:0      1                 idle               kp036
lin-shared-kp                                                up          14-00:00:0     14               alloc               kp[033-035,037-047]
lin-shared-kp                                                up          14-00:00:0      1                idle               kp036

If there are any questions, please contact us via helpdesk@chpc.utah.edu

Last Updated: 7/1/19