CHPC DOWNTIME: General Environment Clusters - May 25&26, 2022
Date Posted: April 21, 2022
Reminder and Additional Impact
Posted: May 11th, 2022
CHPC will have a two-day downtime of the general environment HPC clusters on Wednesday & Thursday, May 25 & 26, 2022 starting at 7 am.
A reminder that CHPC will have a two-day downtime that will impact the clusters in the general environment due to work on the DDC power infrastructure starting at 8 am May 25, 2022. During this outage, CHPC will be moving the OS of the general environment clusters of ash, kingspeak and notchpeak from CentOS7 to RockyLinux8 as mentioned in the initial announcement (below).
CHPC staff is adding three additional tasks to this downtime window:
- OS updates for both the Beehive and Narwhal Window Servers. These two servers will be unavailable starting at 8am May 25th. Once the OS updates are complete and the servers have been returned to service, an announcement will be sent to the CHPC mailing list.
- Migration of the sys branch to the new home directory solution. The sys branch includes the CHPC application tree, containing all CHPC installed end user applications. Note that this will impact jobs in the protected environment HPC cluster of redwood, as well as on any stand-alone linux systems and Virtual Machines, that make use of CHPC installed applications. This migration will be started 8am May 25th and a notice will be sent out when the migration has been completed.
- Migration of the home directories in the CHPC general environment HPC space. This includes the home directories for all users who have the default 50 GB home directories, i.e., users in groups who have not purchased home directory space. Migration of the home directories of users in groups that have purchased home directory space will be migrated at a later date. This migration will also be started 8am May 25th with a notice sent out when the migration has been completed.
Posted April 21, 2022
Due to the need for the replacement of breakers in the Downtown Data Center (DDC) power infrastructure that supports the general environment clusters -- ash, kingspeak, lonepeak, and notchpeak, including the frisco nodes and the cryosparc nodes -- CHPC will have a two-day outage at the end of May. The outage will only be for the general environment clusters, including both the compute and the interactive nodes.
Note that the breaker replacement is part of the maintenance of the DDC power infrastructure, and is the first time that these breakers have been replaced since CHPC has been housed at the DDC.
As this is a longer downtime than is typical, we are providing advance notice to allow CHPC users time to plan accordingly.
Reservations are in place to drain the clusters mentioned above of all running jobs by 7 am on May 25 to allow CHPC staff time to power down the nodes before the work on the breakers starts at 8am
The work on the breakers requires two windows where there will be no power -- the first will be the morning of May 25 and the second the morning of May 26.
CHPC will take advantage of this power maintenance to update the OS from CentOS7 to RockyLinux8 on ash, kingspeak, and notchpeak. In the time between the two power outage windows, CHPC will start the update process. After the second outage window, we will first bring the frisco nodes and the lonepeak cluster back to service, followed by completing the OS update on the remaining clusters before returning them to service. If you have not already started to test your workflows on the new OS, we recommend you do so before this downtime.
Please let us know, via email@example.com, if you have any questions or concerns.