CHPC DOWNTIME: Two targeted outages on February 17, 2021 starting at 8am
Posted January 28th, 2021
Updated February 17th, 2021
Update #2 - February 17th @11:41am
The second half of today's CHPC downtime has been completed.
The Nvidia driver update on the notchpeak gpu compute nodes has been completed and the reservation has been removed. The nodes are now running jobs.
If you have any issues with these nodes, please send a report to email@example.com.
Update #1 - February 17th @9:38am
The replacement of the infiniband switch serving the listed kingspeak interactive nodes has been completed. The nodes are back in service. If you have any issues with these nodes, please send a report to firstname.lastname@example.org.
Work is being started on the netochpeak gpu compute nodes.
On Wednesday, February 17, 2021 CHPC will have two targeted outages, described below.
The first outage is for select kingspeak interactive nodes. This outage will start at 8am and is expected to take 1-2 hours. During this downtime, the infiniband switch that has been having issues which have lead to multiple outages of these nodes, the latest of which occurred on 1/15, will be replaced.
The kingspeak interactive nodes impacted by this outage are:
kingspeak[5-10,12,14-18,21-24] and elmo
The second outage is for notchpeak compute nodes with gpus. This downtime is needed to update the nvidia drivers to allow for support of the new RTX30x0 gpus. The downtime will start at 9am and is expected to take several hours. A reservation is in place to drain the notchpeak gpu compute nodes of batch jobs before the start of the downtime. The non-gpu nodes will continue to run jobs as normal.
The impacted notchpeak compute nodes are: