Skip to content

CHPC DOWNTIME: General Environment Clusters - May 25&26, 2022

Date Posted: April 21, 2022

CHPC clusters OS update information
Posted: May 28, 2022
We would like to remind everyone that all CHPC clusters were upgraded to a new operation system, Rocky Linux 8, during the downtime. As a result if this, some programs don't function as they did before, and users need to use new versions, or new module names to achieve the previous functionality. These changes are listed at:
Before opening a support ticket, please, go over this document, in particular focusing on the changes in compilers, MPIs, Python, R and libraries (especially NetCDF).

CHPC - End of Downtime
Posted: May 27, 2022

The downtime has ended.
All clusters and frisco nodes have been released.
If you have any questions, please let us know

Posted: May 26, 2022
The migration of the home directories of the general HPC space was completed.  These home directories are available for access via samba mounts and on the general environment HPC clusters  (once they are back in service).
Lonepeak and the frisco nodes have been returned to service and the reservation on the batch system on lonepeak has been removed.  Work is on the OS update of the remaining clusters - ash, kingspeak and notchpeak - is continuing.  
Reminder:  As mentioned in the 5/20/2022 announcement about the new /scratch/general/vast file system, the /scratch/kingspeak/serial space has been made read-only as a first step in the retirement of this scratch file system.  Please copy any content that is needed elsewhere by the end of June.  In early July the /scratch/kingspeak/serial space will be unmounted.

Posted May 25, 2022
The updates on Narwhal and Beehive have been completed and both have been opened for user access. The use of path lengths longer than the default 260 characters has been enabled in preparation to be able to more easily be able to mount the linux home and group spaces.
In addition, the migration of the sys branch, which houses the CHPC installed applications, to the new VAST file system has been completed. There should be no issues using these applications on the redwood cluster as well as on any VMs and stand alone servers.
Note that the migration of the home directories in the hpc-home space has not yet completed.  This will be finished tomorrow.
Also note that the clusters in the general environment, including the frisco nodes and the cryosparc servers, remain offline. Once the work on the power infrastructure is completed in the morning, the plan is to bring lonepeak, the frisco nodes and the cryosparc servers back online.  After that work will continue on the OS upgrade on ash, kingspeak and notchpeak.

Reminder and Additional Impact
Posted: May 11th, 2022

CHPC will have a two-day downtime of the general environment HPC clusters  on Wednesday & Thursday, May 25 & 26, 2022 starting at 7 am. 
A reminder that CHPC will have a two-day downtime that will impact the clusters in the general environment due to work on the DDC power infrastructure starting at 8 am May 25, 2022. During this outage, CHPC will be moving the OS of the general environment clusters of ash, kingspeak and notchpeak from CentOS7 to RockyLinux8 as mentioned in the initial announcement (below).

CHPC staff is adding three additional tasks to this downtime window:
  1. OS updates for both the Beehive and Narwhal Window Servers.  These two servers will be unavailable starting at 8am May 25th.  Once the OS updates are complete and the servers have been returned to service, an announcement will be sent to the CHPC mailing list.
  2.  Migration of the sys branch to the new home directory solution. The sys branch includes the CHPC application tree, containing all CHPC installed end user applications.  Note that this will impact jobs in the protected environment HPC cluster of redwood, as well as on any stand-alone linux systems and Virtual Machines, that make use of CHPC installed applications. This migration will be started 8am May 25th and a notice will be sent out when the migration has been completed.
  3.  Migration of the home directories in the CHPC general environment HPC space.  This includes the home directories for all users who have the default 50 GB home directories, i.e., users in groups who have not purchased home directory space. Migration of the home directories of users in groups that have purchased home directory space will be migrated at a later date. This migration will also be started 8am May 25th with a notice sent out when the migration has been completed.

Posted April 21, 2022
Due to the need for the replacement of breakers in the Downtown Data Center (DDC) power infrastructure that supports the general environment clusters -- ash, kingspeak, lonepeak, and notchpeak, including the frisco nodes and the cryosparc nodes -- CHPC will have a two-day outage at the end of May.   The outage will only be for the general environment clusters, including both the compute and the interactive nodes. 
Note that the breaker replacement is part of the maintenance of the DDC power infrastructure, and is the first time that these breakers have been replaced since CHPC has been housed at the DDC.
As this is a longer downtime than is typical, we are providing advance notice to allow CHPC users time to plan accordingly.
Reservations are in place to drain the clusters mentioned above of all running jobs by 7 am on May 25 to allow CHPC staff time to power down the nodes before the work on the breakers starts at 8am
The work on the breakers requires two windows where there will be no power -- the first will be the morning of May 25 and the second the morning of May 26. 
CHPC will take advantage of this power maintenance to update the OS from CentOS7 to RockyLinux8 on ash, kingspeak, and notchpeak.  In the time between the two power outage windows, CHPC will start the update process. After the second outage window, we will first bring the frisco nodes and the lonepeak cluster back to service, followed by completing the OS update on the remaining clusters before returning them to service. If you have not already started to test your workflows on the new OS, we recommend you do so before this downtime. 
Please let us know, via, if you have any questions or concerns.
Last Updated: 5/31/22