CHPC Summer Downtimes and Data Center Move Schedule

Posted: May 9, 2013

Many of you have heard about the modern, off-campus data center that the University has developed in downtown Salt Lake City. Over the past year, CHPC has been planning its move to the new facility, which will bring our community many benefits, including more stable electric power and significantly more expansion capacity for rack space and power. Nevertheless, the move will require some significant disruptions to CHPC services at times over the summer. We ask for your patience and flexibility as we go through this process. By remaining flexible, we believe we can minimize the duration of the downtimes. We will provide frequent updates through email and also our new Twitter feed (@CHPCUpdates).

Here is the anticipated general timeline of the significant steps and milestones in the DDC move process:

May 2013:

  • Configure and test new switch in DDC
  • Receive new "Kingspeak" cluster (see below for a description) hardware and begin provisioning (with the upgraded Red Hat Enterprise Linux 6 operating system – RH6) in DDC
  • Receive and install new CI-WATER storage in DDC
  • Receive and install new Sloan Sky Survey storage in DDC
  • Prepare for June equipment moves
  • May 31: Allocation proposals are due for Ember and Updraft (Updraft only will be allocated through 12/31/2013)

June 2013:

  • Continue receiving and provisioning Kingspeak; begin staff testing, software builds on RH6 (including new batch system software), and early user access
  • June 4th: Regular CHPC Major downtime: Ember, Updraft, and Sanddunearch down for Komas machine room maintenance as usual
    • Move Atmospheric Sciences cluster (atmos, meteo and wx, and nodes, except gl nodes) - expect an extended downtime for these servers of approximately 2 days beginning June 4th
    • Move kachina.chpc.utah.edu and swasey.chpc.utah.edu - Expect extended downtime of 2 days
    • Move phase I of VM Farm - No downtime expected
    • Move of Apexarch cluster and homerfs – Expect extended downtime of 2 days
    • UCS Nodes and attached storage – Expect extended downtimes of 2 days

July 2013:

  • Batch system up - Kingspeak cluster will run in freecycle mode through October 1

August 2013:

  • All users will be given access to the Kingspeak cluster in freecycle mode.
  • Move Ember cluster - current downtime estimate is 3 +/- 1 weeks. This window will be more tightly specified based on move experience over the summer and more detailed work scheduling as this window approaches.
  • August 31, 2013: Allocation requests are due for Kingspeak and Ember. No further allocations will be awarded on Updraft. September 2013:
  • September: Ember will be brought up under RH6 and under the new batch system and will run in freecycle through October 1.

Please note that we will not be moving the Sanddunearch and Updraft clusters to the DDC, but instead will run them in place until December 31, 2013 or thereabouts. These nearly end-of-life clusters will be retired as the remodeling of the former Komas data center is scheduled to begin at that time. Also slated for retirement are /scratch/serial, /scratch/uintah, and /scratch/general file systems. These /scratch systems will not be mounted on Kingspeak or on Ember after it has been moved to the DDC.

Please relay any concerns about this planned work, particularly in regard to deadlines for conferences and grant proposals and other impacts.

Kingspeak cluster details (general nodes):

  • 32 nodes (16 cores each) - 512 cores total
  • 2 interactive nodes
  • 2.6Ghz speed with AVX support: 10.6 TFlops max, (without AVX: 5.3 TFlops)
    • Note that not all codes will be able to take advantage of the AVX support as this feature is dependent upon how well the codes vectorize.
    • Also note that the general nodes on Ember run at a max speed of 9 TFlops
  • Infiniband interconnect
  • New /scratch space of approximately 150 TBytes