The University of Utah Home Page The University of Utah

Login

Login

CHPC Documentation

CHPC News Announcements

CHPC to Retire /scratch/parallel

Posted: April 30, 2008

Beginning May 1st CHPC will begin the process of retiring the filesystem mounted at /scratch/parallel. As previously announced, we will starting removing the mount of this filesystem on the compute nodes as jobs complete over the next week. Beginning May 1st, DO NOT submit jobs utilizing /scratch/parallel.

On May 9th, 2008 any remaining mounts of /scratch/parallel will be removed from all compute nodes. The filesystem will remain available on the interactive nodes until June 1st.

On June 1st, 2008 the filesystem will be retired. Please move data you wish to keep from the /scratch/parallel filesystem prior to June 1st.

CHPC will be exploring options for parallel filesystems and may deploy something for this purpose in the future.


MPICH2 (for Ethernet upgraded on Arches to version 1.0.7

Posted: April 15, 2008

We have upgraded MPICH2 (for Ethernet) on Arches to version 1.0.7. This is a minor upgrade and as such you should be able to use old executables with the new version. If problem occurs, please, rebuild your program (no changes in Makefiles, etc required). If that does not help, please, let us know.

Arches Clusters back after failed hardware in network router

Posted: April 7, 2008

We had a supervisor card fail in our router for the clusters this morning. We were able to replace the failed hardware and get the router healthy again about 1 p.m. after which we reviewed the health of the nodes. Please let us know if you see any problems by sending email to issues@chpc.utah.edu


Spring 2008 CHPC Presentations Series begin this Thursday April 10th

Posted: April 7, 2008

All presentations are held in the INSCC Auditorium.

Overview of CHPC

Date: April 10th, 2008
Time: 10:30 a.m.

Introduction to Parallel Computing

Date: April 10th, 2008
Time: 1:00 p.m.

Chemistry Packages at CHPC

Date: April 17th, 2008
Time: 1:00 p.m.

Using Gaussian03 and Gaussview

Date: April 24th, 2008
Time: 1:00 p.m.

For more information please go to CHPC Presentations


Unscheduled Network Outage of arches clusters and telluride - 4/7/2008

Posted: April 7, 2008

Duration: Unknown

Systems Affected/Downtime Timelines: Connectivity of all of the arches clusters and telluride.

We've had a switch go down which affects connectivity to all of the arches clusters and telluride. Our staff are working on the problem. We'll send an update when we know more.

Power Restored to INSCC 4/6/2008 about 10:21 a.m.

Posted: April 6, 2008

Full power was restored to the INSCC Building about 10:21 a.m. on April 6th after the planned outage. All CHPC services are online or in the process of being verified. Some INSCC departmental services may be off-line until individual administraters are able to finish bringing up various servers.

The INSCC Rm 275 UPS (used during the outage to maintain file server connectivity to the HPC clusters) did not come back off battery at that time. CHPC staff working with electricians were able to determine and remedy the problem. As of 10:56am, the UPS was fully functional.

Campus electricians report that their work with energizing a new substation was successful.


New scratch space available

Posted: April 4, 2008

A new scratch space is available for computations on the arches clusters. The space is mounted at /scratch/serial and is 16 TB.

Please note that the initial performance of this new space may not be representative of what you'll see in the future. The first, early users may see exceptionally fast performance, but if many users move data and start running jobs, it may perform exceptionally slowly for a time. The performance should stabilize over the next few weeks.

**Important**: We will discontinue mounting the /scratch/parallel space on the compute nodes beginnning May 1st, 2008 but accesss via the interactive nodes will remain for an additional few weeks. Users should migrate all their data off /scratch/parallel as soon as possible.


Power outage this weekend affecting much of lower campus: April 5-6th

Posted: April 1, 2008

Beginning Saturday, April 5th at 11:00 p.m., the power to much of lower campus will be down in order for the electrical shop and contractors to prepare the power grid for a co-generation plant. The estimate for this outage is 24 hours.

The impacts to the INSCC building and INSCC data center (Rm 275):

Desktops and personal space: We recommend tenants of the INSCC building shut down your desktops before you leave for the weekend. We further recommend that any perishables in refrigerators be removed prior to this outage.

System administrators: All equipment in the INSCC data center should be turned off prior to 11 p.m. Saturday. There will be no air conditioning and no power available for servers.

HPC systems: The HPC systems should be able to ride out this outage since they are on another power substation within Research Park . The fileservers and supporting equipment in the SSB data center will be on backup generator power. The two data centers will interconnect via a router on generator power.


The default version of Gaussian is now the new E.01 version (and Gaussview 4).

Posted: March 20, 2008

You may remember I changed to this new version last downtime but there were issues with the use of /scratch/parallel and I moved back to the D.01 version These issues have been resolved so during this downtime I again made this latest version the standard one.

If for any reason you still want to run the D.01 version you will need to update the g03root environment in your batch script to point to it:

setenv g03root /uufs/arches/sys/pkg/gaussian03/D.01

If you have any questions/problems, please contact Anita.


Batch jobs after downtime

Posted: March 20, 2008

We have just been made aware of the fact that several jobs that had completed before the downtime are showing up in the queue and being rerun. This situation is the result moving to a new nfsroot server over the downtime – specifically the process of syncing information from the old to the new one.

Based on the dates of the jobs it does not seem to be widespread, however users of the batch queues on arches should take a minute and check the jobs that they own – if any of the jobs are old ones, and you do not want them to rerun you should be able to delete them using the qdel command (qdel jobid).


Arches Update

Posted: March 20, 2008

Event date: March 19, 2008

Delicatearch, Tunnelarch, Marchingmen and Telluride queues available at 8:44 p.m. on March 19, 2008
Sanddunearch queues available 12:42 a.m. on March 20, 2008
Landscapearch queues available 11:57 a.m. on March 20, 2008

CHPC DOWNTIME: Starts Tuesday March 18, 2008 at 5PM

Posted: March 12, 2008

SCOPE: HPC(Arches), Network, desktop access to filesystems

DURATION:

Network access to INSCC should be restored at about 10pm on March 18th

Desktop access to fileservers (home directory access) will be restored during the morning of March 19th - a message will be sent to users when the systems are up and ready to be used

Arches will be back up sometime later in the day on March 19th - again a message will be sent when it is ready for use. Reservations have been set so that no jobs that will not finish before 5pm March 18th will be started. Jobs waiting in the queue will be started once the downtime has finished.


CHPC Major Downtime: 3/18/2008

Posted: February 27, 2008

Duration: From 5 p.m. Tuesday 3/18 until 5 p.m. 3/19

Systems Affected/Downtime Timelines: All CHPC networks, arches, fileservers, desktops

Major CHPC Downtime

Core Infrastructure down 5 p.m on 3/18, back up by Midnight including: fileservers etc., SSB and INSCC dependencies.

Arches Cluster down 5 p.m. 3/18 until 5 p.m. 3/19


CHPC Migration to Jira

Posted: February 26, 2008

CHPC to migrate to new issue tracking system on Monday February 25th, 2008.

On Monday, February 25th, CHPC will be changing systems for tracking problems and questions. The new system will also change the terminology we use from "Problem Report" to "Issue".

You can currently get to the new system at http://jira.chpc.utah.edu and you can create your own account by going to that address. If you setup your account ahead of time (recommended) you can set your username to your uNID. We are planning to eventually have the Jira system authenticate to the campus servers, so by doing this, you will make it easy to maintain your account.

When you setup your account, make note of the email address you specify. Jira associates you to your user account by matching the "from" email address. Please setup your account to use your primary email address, that is, the account from which you will be sending Issues.

***Please note: If you send in issues from another email account, Jira will create a new account for you.

Over the weekend, we will be closing Problem Reports in the current system and moving them to jira. When we close them, we will be referencing the jira KEY for you, so you may find them in the new system. As your accounts get established, we'll be able to associate the Issues with your jira account. Please let us know if you have any questions or concerns.

We are excited about the new system and its potential to improve our service.


Change in maximum wallclock time on delicatearch

Posted: February 4, 2008

Change in maximum wallclock time on delicatearch

In order to improve utilization of the Delicatearch cluster, we have increased the maximum wall clock time for jobs from 24 hours to 72 hours (3 days). Please, take advantage of this fact to offload the very busy Sanddunearch cluster. Despite its age and slower CPUs, Delicatearch has fast Myrinet network and performance on 4 Sanddunearch nodes (16 processors) should not be any better than Delicatearch's 16 nodes (32 processors).


CHPC Batch Systems Paused: Tuesday February 5, 2008

Posted: February 4, 2008

Duration: Downtime starts at 4pm and will last about an hour.

Systems affected:

All of Arches and any computation cluster under batch control

The clusters impacted by this are: sanddunearch; delicatearch; marchingmen; tunnelarch; landscapearch and telluride.

We will be pausing the moab schedulers on all CHPC computational clusters under batch control, tomorrow, Tuesday February 5th, 2008 at 4:00 p.m., for about an hour. This is to perform system maintenance on one of our administrative systems.

Scope: This means that no new jobs will be started during this period of time. You may still queue jobs up, look at the queues and running jobs will continue to run. The clusters impacted by this are: sanddunearch; delicatearch; marchingmen; tunnelarch; landscapearch and telluride.

Please let us know if you have any questions.


CHPC DOWNTIME: Thursday January 3, 2008

Posted: December 19, 2007

Event date: January 3, 2008

Duration: Downtime starts at 3pm and will last until sometime early morning on January 4, 2008

Systems affected:

All of Arches and CHPC/INSCC Network

After this downtime all users will be using the campus uNID and password for authentication on all HPC systems (and other Linux systems admined by CHPC). Windows users will use the uNID and current password for authentication.

Arches:

All clusters will be down from 3pm to allow for updates to the OS and for the other changes outlined below. The Batch Queues will be drained of all running jobs. Reservations are in place so that jobs will not be started if they will not finish before the start of the downtime. Jobs that are queued but not running will be started after the downtime ends. The one exception to this is if you are being moved to using your unid for authentication during this downtime (see below); in this case any queued jobs you have will need to be deleted. The clusters will down until sometime the following morning.

**MIGRATION TO NEW FILESERVER: Some CHPC Users, those on the CHPC owned home directory filesystems i.e., those with home directories /uufs/inscc.utah.edu/common/home/USERID - will be migrated to a new, larger fileserver during this downtime. If you are one of these users your new home directory path will be /uufs/chpc.utah.edu/common/home/UNID

**CHANGE TO UNID: All CHPC users that are not already using their UNID as the CHPC login will be changed to doing so. If you do not have a UNID you will need to get one BEFORE this downtime. All University of Utah students and employees automatically have a UNID. But if you are a not a part of the University of Utah, you need to fill out a Person of Interest (PoI) form to get assigned a UNID. This form can be found at http://www.hr.utah.edu/forms/lib/u-affiliate-poi-form.pdf.

Network Outage:

All networking in CHPC/INSCC will be down from about 5-7pm