Skip to content

DOWNTIME: September 22, 2015, beginning at 6:30 a.m.

Date Posted: September 8, 2015

CHPC Downtime

(9/22 5:53 p.m.) Final Downtime Update: The reservations on the clusters have been removed. Please feel free to login to the interactive nodes and to submit jobs. Please let us know if any problems by sending a report into issues@chpc.utah.edu

Please note:

  1. Users need to remove their ~/.lmod.d/.cache directories due to the update made in the lmod version
  2. Batch queues on kingspeak and lonepeak were flushed of all pending jobs due to a scheduler configuration change.  The other clusters (ash, tangent and ember) still have the jobs that were pending before the downtime.

 Also note that there still be will be work done on the network at the campus level later tonight, specifically:

  • At 8pm, the will be interruptions along with degraded performance in network traffic on the Science DMZ only.  This will impact the Data Transfer Nodes (DTNs) only. This is expected to take about an hour.
  • At 11pm, the work by UEN will start.  There will be noticeable degradation in performance but no complete outages. This is expected to take about 2 hours.

(9/22 1:58 p.m.)  Update #2: Kachina and Swasey are now available for use. 

(9/22 8:52 a.m.) Update #1:  Network access within the INSCC building has been restored.

Sept 22, 2015 starting at 6:30AM

Batch queues on all clusters have a reservation in place to drain queues of running jobs by this time.

All users should close sessions (including FastX and Remote desktop sessions) on CHPC resources and logout of desktops before this time.

SCOPE: 

During this downtime CHPC will perform maintenance on the networking switch in INSCC as well as perform maintenance and apply updates on the clusters along with the window servers kachina and swasey.  In addition there will be work on the Protected Environment VM farm and work on the campus level network.

IMPACT:

Network:

  • In the morning, starting at about 7am, there will be no wired network access from anywhere in the INSCC building for about an hour as CHPC works on the switch. 
  • At 8pm, the will be interruptions along with degraded performance in network traffic on the Science DMZ only.  This will impact the Data Transfer Nodes (DTNs) only. This is expected to take about an hour.
  • At 11pm, the work by UEN will start.  There will be noticeable degradation in performance but no complete outages. This is expected to take about 2 hours.

Kachina and Swasey:  

  • Will be unavailable for most of the day.

VMs in the Protected Environment:

  • Expect rolling, intermittent disruptions of individual VMs in the protected environment between the hours of 8-11am as network configuration changes are being made.  Outages of any given VM should be less than 1 hour.

Non-Protected Environment VMs: 

  • No interruption in service.

Clusters: 

  • Will be unavailable for most of the day. This includes compute and interactive nodes of  kingspeak, ember, ash, apexarch, lonepeak (this includes the atmos and meteo nodes), and tangent.
  • Idle jobs will be purged due to a configuration change being made in slurm
  • Due to  the kernel, drivers and compilers updates that are being applied to the cluster nodes, users should test their applications after the downtime.  We may need to rebuild the Message Passing Interface (MPI) versions, and if so this may require that applications be rebuilt.  We will let users know if the MPI’s have been rebuilt.

Access to CHPC file systems:

  • There will be no interruptions to file system access with the exception of the occupants of INSCC building during the time the work on the switch in INSCC is being completed.

We will send announcements when wired network access within the INSCC building is restored, as well as when  the clusters and the windows servers are available for use.

Please let us know if you have any questions or concerns by sending email to issues@chpc.utah.edu.

Last Updated: 6/11/21