January 23rd, 2:27 p.m.
January 22nd 10:37 p.m.
CHPC sysadmins determined the cause of the connection failures and put a "band aid" fix in place. A permanent fix will need to be more thoroughly evaluated in the next few days.
As of now, clusters should be functional with the above caveat. If you notice something out of order, please, let us know.
We will notify once the permanent fix is decided on.
January 22nd 5:45 p.m.
The afternoon of January 22nd, we observed an issue with the Kingspeak resource manager and since then we have observed transient connectivity issues on all general environment cluster nodes. These are occurring often enough that users will see issues with the use of the clusters, both with sessions on the interactive nodes and with jobs on the compute nodes.
We are continuing to diagnose the problem, and will post additional updates when we know more or are able to resolve the issue.