2008 CHPC News Announcements

New Cluster UPDRAFT in production January 1, 2009

Posted: December 31, 2008

The new cluster (updraft.chpc.utah.edu) is to be used primarily for large jobs which require huge amounts of compute power. The queues on the new system will be optimized to accommodate these large jobs. The new cluster has 2048 cores (256 dual-quad core nodes) for computation running over an InfiniBand interconnect. A large portion of this cluster will be dedicated to the research groups involved in the system acquisition (uintah), but approximately 1/3 of the available cycles will be available through the regular CHPC allocation process. When requesting allocation on updraft, please detail your code(s) effectiveness in running parallel.

There are a few differences in the new system from other CHPC clusters:

* The maximum run time is 24 hours for general users and 12 hours for uintah users.
* Jobs run freecycle (without allocation) will be preemptable. This means that jobs with allocation will preempt, or kill your job.
* Users with allocation my request their jobs be preemptable by adding qos=preemptable to the PBS -l flag. Jobs run with allocation and preemptable qos will be charged at 1/4 the regular Service Unit (SU) rate.
* The maximum runtime for freecycle and preemptable jobs is 24 hours.
* There is a quality of service (qos) of "bigrun" which takes priority over all other jobs and will be restricted to users who request access and detail out the need to run on a large number of processors.
* There will be three dedicated access times (dats) each month where one user has access to the entire system. Dats will begin on the first, third and last Mondays of each month, beginning at Noon and running until the following Wednesday at Noon (48 hours). The first two are reserved for uintah users. The last one is for general CHPC users. The first dat is begins Monday, January 5th at Noon. The first dat available to general CHPC users begins January 26th.
* To request access to bigrun or to reserve a dat, please send an email to issues@chpc.utah.edu with details of your code's parallelism and computational requirements, and any other justification for the request.

We have been testing this environment for several weeks now, but there still may be some unexpected adjustments to the configuration or behavior of the queue. As always, if you have questions or problems regarding updraft or any other CHPC service, please contact us by sending email to issues@chpc.utah.edu.

Happy Holidays!


CHPC DOWNTIME: December 16 starting at 8am - COMPLETED by 5 p.m.

Posted: December 10, 2008

Event date: December 16, 2008

Duration: Most of the day

Systems Affected/Downtime Timelines:

  • 8:00 a.m. - 10:30 a.m. - intermittent networking outages in INSCC

  • 8:00 a.m. - back up about 3:20 p.m. - Desktops mounting CHPCFS filesystems

  • 8:00 a.m. - back up by 5 p.m. - All Clusters (arches, telluride, updraft, meteo nodes)

Instructions to User:

After the downtime, if you experience problems with your desktop, please try a reboot. If you still have a problem, contact the CHPC Service desk at issues@chpc.utah.edu.

This downtime was for maintenance of the cooling system in the Komas datacenter, and therefore will required all clusters housed in the data center to be down from 8am until about 5pm. CHPC will take advantage of this down time to do a number of additional tasks, including work on the network in the morning and file servers and clusters for most of the day.

All file systems served from CHPCFS will be unavailable for a good part of the day. This includes HPC home directory space as well as departmental file systems supported by CHPC. We will work to get things online as soon as possible.


CHPC presentation - Introduction to Programming in OpenMP - postponed to Dec. 11th

Posted: December 3, 2008

Event date: December 11, 2008

Due to a scheduling conflict in the INSCC Auditorium, the Introduction to Programming in OpenMP will take place on December 11th at 1pm in the INSCC Auditorium. This is a week from the previously announced time.


CHPC presentation - Introduction to Programming in OpenMP

Posted: December 2, 2008

Event date: December 4, 2008

Duration: 1pm - 2pm

Introduction to Programming with OpenMP Thursday December 4th, 1pm INSCC Auditorium (Rm 110) This talk introduces OpenMP, an increasingly popular and relatively simple shared memory parallel programming model. Two parallelizing schemes, parallel do loops and parallel sections, were detailed using examples. Various clauses that allow user to modify the parallel execution were also presented, including sharing and privatizing of the variables, scheduling, synchronization and mutual exclusion of the parallel tasks. Finally, few hints were given on removing loop dependencies in order to obtain effective parallelization. Note that this talk is given instead of originally posted "Parallel performance analysis with TAU". We will present a new talk about parallel profiling tools in the spring, after we evaluate and deploy Intel Cluster Toolkit, which contains more user friendly MPI checking and profiling tools.


CHPC Presentation: Gaussian03 & GaussView, Thursday 11/20, 1-2pm, INSCC Auditorium (RM 110)

Posted: November 17, 2008

Event date: November 20, 2008

This presentation will focus on the use of Gaussian03 and GaussView on the CHPC arches clusters. Batch scripts and input file formats will be discussed. Parallel scaling and timings with the different scratch options will also be presented, along with a discussion of scratch needs of Gaussian03. Finally several demonstrations on the use of GaussView to build molecules, input structures, set up input files and to analyze output files will be presented.


New Updraft Cluster - CHPC Accepting Proposals - Due December 1st, 2008

Posted: November 14, 2008

CHPC has recently taken delivery of a new cluster for capability computing. Traditionally CHPC has tuned cluster usage for highest throughput. The new cluster, named UPDRAFT, will be used primarily for large jobs which require huge amounts of compute power. The queues on the new system will be optimized to accommodate these large jobs. The new cluster will have 2048 cores for computation running over an InfiniBand interconnect. Each of the 256 Sun Fire X2250 server nodes will have 2 quad core processors (Intel Xeon) running at 2.8 Ghz. In addition there will be 2 interactive nodes for job submission, short compiles and editing. Each interactive node has 2 Intel 3.16 Ghz Quad core processors (8 per node). There will also be a large scratch space attached and available to all of the nodes of 24 Terabytes (raw) which translates into 18 Terabytes of usable space. We have achieved 17.5 Tflops on the HPL benchmarks.

A large portion of this cluster will be dedicated to the Research Groups involved in the system acquisition, but approximately 1/3 of the available cycles will be available through the regular CHPC allocation process. The proposals are due December 1, 2008. Please see http://www.chpc.utah.edu/docs/policies/allocation.html for more details.


CHPC Presentation: Chemistry Packages at CHPC, Thurs 11/13, 1-2pm, INSCC Auditorium

Posted: November 11, 2008

Event date: November 13, 2008

Instructions to User: INSCC Auditorium, 1-2pm

This talk will focus on the computational chemistry software packages - Gaussian, Amber, NWChem, Molpro, Dalton, GAMESS Babel, GaussView, ECCE - that are available on CHPC computer systems. The talk will be an overview of the packages and their capabilities, and will focus on details of how users can access the installations at CHPC. This talk is the precursor for a second talk scheduled for next week that will focus on the use of Gaussian 98/03 and GaussView.


Problem that started after our September downtime

Posted: October 27, 2008

Instructions to User: After the last downtime in September, we have discovered that some users get an error when trying to log in or to scp files that shows something like this:

wc: /uufs/chpc.utah.edu/common/home/u0123456/linuxips.csh: No such file or directory
No IP alias file found, most likely due to CHPC webserver being down

This problem is due to Linux updates that we have performed which seem to have switched the order of how the default user environment gets set up. It is affecting users whose default shell is tcsh or csh. To fix this problem, please, edit your ~/.tcshrc file and at the beginning of this file, before first execution statements, put this:

# Test if UUFSCELL is defined, if not, see if uufs.csh exists and if it
# does source it - this fixed problems with crontab
if (!($?UUFSCELL) && (-e /etc/profile.d/uufs.csh)) then
source /etc/profile.d/uufs.csh
endif

We encourage all tcsh/csh users to check their ~/.tcshrc and make sure this fix is in there.

If you have any questions, please, contact us at issues@chpc.utah.edu


CHPC Presentation: Mathematical Libraries at CHPC, Thurs. 10/30, 1-2 p.m., INSCC Auditorium (RM110)

Posted: October 27, 2008

Presented by: Martin Cuma

In this talk we introduce the users to the mathematical libraries that are installed on the CHPC systems, which are designed to ease the programming and speed-up scientific applications. First, we will talk about BLAS, which is a standardized library of Basic Linear Algebra Subroutines, and present few examples. Then we briefly focus on other libraries that are in use, including freeware LAPACK, ScaLAPACK, PETSc and FFTW.


CHPC Presentation: Debugging with Totalview

Posted: October 22, 2008

Event date: October 23, 2008

Duration: Thurs., 10/23, 1-2.30pm, INSCC Auditorium

This talk introduces Totalview, a debugger that has become a standard in the Unix code development comunity. After short introduction to its major features, we will present three examples, serial, parallel OpenMP and parallel MPI codes. Using these examples, we will show common and specific features for debugging these codes. We will also spend short time introducing Intel Thread Checker, which is an useful tool for OpenMP code checking. Finally, those interested can stay for an extra half hour, between 2pm and 2.30pm, for informal practical presentation of various useful Totalview features.


CHPC Unplanned Outage: CHPCFS fileserver, October 9th, 2008 from 11:30 a.m. until approximately 6:00 p.m.

Posted: October 9, 2008

Duration: October 9th, 2008 from 11:30 a.m. until approximately 6:00 p.m.

Systems Affected/Downtime Timelines:

All systems mounting file systems from CHPCFS. Outage began about 11:30 a.m. and we plan to have most file systems mounted and stable by 6:00 p.m.

Arches Downtime Duration:

The Arches clusters will not go down, but the scheduler will be paused during this window, preventing jobs from starting. If you have running jobs which are writing to /scratch file systems, your jobs should continue to run.

We are continuing to see problems with the file server as we work to correct the hardware failure of yesterday (10/8) morning. In order to bring all file systems back to a stable state, we have decided to take down the file server temporarily. We sincerely apologize for the inconvenience.


CHPC Unplanned outage: CHPCFS file server, October 8th 7:20 a.m. until about Noon

Posted: October 9, 2008

Duration: October 8th 7:20 a.m. until about Noon

Systems Affected/Downtime Timelines: All systems mounting file systems from CHPCFS. Outage began about 7:30 a.m. and most file systems were available by Noon.

The CHPCFS file server is offline which affects all systems mounting file systems from this server.

During a planned service outage from 7:30-8 a.m. today we had a storage controller failure resulting in a delay in getting the fileserver back up. We are taking action to get the bulk of CHPCFS up as soon as possible but a few groups may take longer. We will send out more information as it becomes available. We will contact the individual groups of the affected file systems with more details and the recovery time frame as more is known about the failure.

We apologize for the inconvenience and are working hard to correct the problem.


CHPC Presentation: Telematic Collaboration with the Access Grid, Thurs., 10/9, 1-2 p.m., INSCC Auditorium (RM110)

Posted: October 6, 2008

CHPC Presentation: Telematic Collaboration with the Access Grid
Presented by: Jimmy Miklavcic and Beth Miklavcic
Date: Thursday October 9, 2008
Time: 1:00 - 2:00 p.m.
Place: INSCC Auditorium (RM110)

The Access Grid video conference communications software is an advancing set of elements combined into an amazing and powerful tool kit that can be employed to create distributed performances that involve a variety of sites throughout the world. This exciting, emergent and creative vehicle can bring about a host of fascinating challenges. Ironically, the first among them and the most important is communication. Sharing and coordinating ideas among a group of local collaborators is difficult enough. Distribute the same collaborators across the globe and share and coordinate ideas through a 320 by 240 pixel portal and the difficulty is compounded.

The development of the InterPlay performance series began in 1999 and was built upon the Access Grid infrastructure. It was followed by an emerging first public performance in 2003 and continues to date. It has created many unique challenges and at present, the challenges have matured and multiplied with each subsequent performance. This developmental process, the issues surmounted and those currently being addressed are discussed in this presentation.


CHPC Presentation: Introduction to programming with MPI, Thurs. 10/2, 1-2 p.m., INSCC Auditorium (RM110)

Posted: September 29, 2008

by Martin Cuma

This course discusses introductory and selected intermediate topics in MPI programming. We base this presentation on two simple examples and explain the MPI parallel development of them. The first example encompasses MPI initialization and simple point to point communication (which takes place between two processes). The second example includes introduction to collective communication calls (where all active processes are involved) and options for effective data communication strategies, such as derived data types and packing the data. Some ideas on more advanced MPI programming options are discussed in the end of the talk.


CHPC Systems back online after weekend downtime

Posted: September 28, 2008

This weekend (9/27-28) downtime is finished. Arches has updated OS and we also updated the Myrinet MX drivers on Landscapearch and Delicatearch and InfiniBand OFED drivers on Sanddunearch.

We have also updated the MPICH-MX and MPICH2-MX distributions to their latest version.

Since all MPI builds with either MX or OFED link dynamically, users should not need to rebuild their executables. If you do have some problem with running your programs after the downtime, please, first, recompile your program. If this does not work, contact us at issues@chpc.utah.edu.


TotalView Express Student Edition Program

Posted: September 23, 2008

TotalviewTech which develops the Totalview debugger recently came up with a new Student version of the Totalview debugger.

The TotalView Express Student Edition Program enables students enrolled at the university to receive a complimentary version of the TotalView Debugger for use until they graduate. There is no cost associated with this program, either for the students or the university.

Martin Cuma will act as a coordinator for this program. Those interested please, contact him (mcuma@chpc.utah.edu) with the following information:
* name
* email
* student id
* year of graduation


We will send this info to TotalviewTech which then in turn contacts the interested students with a link to download the Totalview installation, and with a license that expires at the year of graduation.

We think that this is a great opportunity to get a good debugger for free (at least while one is a student), so, we encourage those who develop codes in C/C++ or Fortran to apply.


CHPC Presentation: Statistical Resources at CHPC: Thurs. 9/25, 1-2 p.m. INSCC Auditorium

Posted: September 22, 2008

Title: Statistical Resources at CHPC
Date: Thursday September 25, 2008
Time: 1-2 p.m.
Where: INSCC Auditorium (RM 110)

This presentation by Byron Davis gives users (and potential users) of CHPC's statistical resources an overview of the equipment and software presently available. Additionally a list of specialized statistical software will be presented that we've supported over the past 10 years or so.


CHPC Major downtime: from 9 a.m. 9/27 (Sat) until sometime 9/28, (Sun)

Posted: September 19, 2008

Duration: From 9 a.m. Saturday 9/27 until sometime Sunday 9/28

Systems Affected/Downtime Timelines: All equipment in the SSB machine room including all file servers supported by CHPC starting at 9 a.m. Saturday. The Arches clusters including telluride and the meteorology compute servers beginning at 9 a.m. Saturday. The network equipment serving INSCC and the clusters will be intermittently unavailable in the late evening on Saturday.

Arches Downtime Duration: From about 9 a.m. Saturday until sometime on Sunday.

Instructions to User: If you mount file systems supported by CHPC on your desktop, it would be best to shutdown your machine before you leave for the weekend. Jobs running on the clusters will be drained from the queues.

There will be some major work on the electrical equipment serving the SSB machine room from Saturday 9/27 until Sunday 9/28. CHPC will take advantage of this downtime to do some system and network administration (originally planned for Oct. 14). At 9:00 a.m. on Saturday, September 27th access to desktops mounting CHPC supported file systems will be unavailable and the clusters will be taken down. Jobs running on the clusters will have been drained from the queues. The networks will unavailable for periods in the late evening on Saturday. We expect everything to be back online sometime on Sunday 9/28.


CHPC Presentation: Introduction to Parallel Computing, Thurs. 9/18, 1-2 p.m., INSCC Auditorium (RM110)

Posted: September 15, 2008

Introduction to Parallel Computing

Thursday, September 18, 2008
1:00 - 2:00 p.m.
INSCC Auditorium

Presented by Martin Cuma

In this talk, we first discuss various parallel architectures and note which ones are represented at the CHPC, in particular, shared and distributed memory parallel computers. A very short introduction into two programming solutions for these machines, MPI and OpenMP, will then be given followed by instructions on how to compile, run, debug and profile parallel applications on the CHPC parallel computers. Although this talk is more directed towards those starting to explore parallel programming, more experienced users can gain from the second half of the talk, that will provide details on software development tools available at the CHPC.


CHPC Presentation: Overview of CHPC, Thurs. 9/11, 1-2 p.m., INSCC Auditorium (RM 110)

Posted: September 10, 2008

Overview of CHPC

Thursday, September 11, 2008
1:00 - 2:00 p.m.
INSCC Auditorium

This presentation gives users new to CHPC, or interested in High Performance Computing an overview of the resources available at CHPC, and the policies and procedures to access these resources.

Topic covered will include:

  • The platforms available
  • Filesystems
  • Access
  • An overview of the batch system and policies
  • Service Unit Allocations

Visit our full list of Fall 2008 Presentations


CHPC Presentation Schedule begins 9/11/2008

Posted: September 4, 2008

Mark your calendars!!

All presentations will be on Thursdays from 1:00-2:00 p.m. in the INSCC Auditorium

  • 09/11: Overview of CHPC (Julia Harrison)
  • 09/18: Intro to Parallel Computing (Martin Cuma)
  • 09/25: Statistical Resources at CHPC (Byron Davis)
  • 10/02: Intro to MPI (Martin Cuma)
  • 10/09: Telematic Collaboration with the Access Grid (Jimmy Miklavcic and Beth Miklavcic)
  • 10/16: **FALL BREAK**
  • 10/23: Debugging with Totalview (Martin Cuma)
  • 10/30: Mathematical Libraries (Martin Cuma)
  • 11/06: To be announced
  • 11/13: Chemistry Packages at CHPC (Anita Orendt)
  • 11/20: Using Gaussian03 and Gaussview (Anita Orendt)
  • 11/27: **HOLIDAY**
  • 12/04: Parallel performance analysis with TAU (Martin Cuma)
  • 12/11: Intro to Programming with Openmp (Martin Cuma)

Please see http://www.chpc.utah.edu/docs/presentations/ for more information.


CHPCFS outage: approximately 5:15 - 5:40 p.m. Tuesday, August 19, 2008

Posted: August 19, 2008

The fileserver which hosts many of the home directory filesystems supported by CHPC experienced a brief outage which began about 5:15 p.m. CHPC re-booted the system and it was back and functional by about 5:40 p.m. We apologize for any inconvenience.


/scratch/serial filesystem BACK ONLINE at 10:15 a.m. Wednesday 8/6/2008.

Posted: August 5, 2008

The /scratch/serial filesystem suffered a kernel panic around 3:15 p.m. on Tuesday, August 5th, 2008. We are performing a fsck of the filesystem now, but estimate it to take from 16-20 hours. Jobs utilizing /scratch/serial will have problems. If you submit any jobs in the next 24 hours, please utilize another /scratch system. There may be corrupted/lost files on /scratch/serial. We apologize for the inconvenience, and will notify you when the filesystem available. The /scratch/serial filesystem was brought back online Wednesday, August 6th, 2008 at about 10:15 a.m.


**MAJOR CHPC DOWNTIME** July 15th 2008 from 8:00 a.m. until approximately 9:00 p.m.

Posted: July 3, 2008

Duration: 8:00 a.m. until about 9:00 p.m.

Systems Affected/Downtime Timelines: INSCC Networks: 8:00 until 8:30 a.m. INSCC Desktops: 8:00 a.m. until approximately 2:00 p.m. HPC Clusters (Arches, Telluride): 8:00 a.m. until approximately 9:00 p.m.

Arches Downtime Duration: HPC Clusters (Arches, Telluride): 8:00 a.m. until approximately 9:00 p.m.

Instructions to User: Shut down desktops before 8:00 a.m. on July 15th, 2008.

Maintenance Will be performed on the coolers in the Komas data center requiring the clusters to be powered off. CHPC will also be performing system maintenance on some networking equipment. While the INSCC networks technically are not going down, because of core server changes (NIS, DNS) it may appear to users that the network in INSCC is not functional between 8-8:30 a.m. Desktops maintained by CHPC (mounting CHPC file servers) will be affected from about 8:00 a.m. until approximately 2:00 p.m. - maintenance will be performed on several service machines as well as home directory filesystems. Once the cooling maintenance is complete, CHPC will perform system maintenance on the Arches and Telluride Clusters. We expect to have the HPC systems up and scheduling jobs by approximately 9:00 p.m.

Downtime Summary:
  • INSCC Networks: 8:00 until 8:30 a.m.
  • INSCC Desktops: 8:00 a.m. until approximately 2:00 p.m.
  • HPC Clusters (Arches, Telluride): 8:00 a.m. until approximately 9:00 p.m.

Upgraded compiler and mvapich software

Posted: June 30, 2008

We have upgraded PGI and Pathscale compilers to versions 7.2 and 3.2, respectively. Both are minor upgrades so they should work together with older codes.

Also, we updated MVAPICH and MVAPICH2 on Sanddunearch to version 1.0.1 and 1.0.3. These are also minor updates, mainly bug fixes.

If you experience any problems with the updates, please, let us know at issues@chpc.utah.edu


Brief Fileserver Outage: Wednesday 6/25/2008 5 p.m.

Posted: June 25, 2008

At about 5 p.m. please expect brief (5-15 minutes) file system outages on CHPCFS (main CHPC file server). This will potentially affect the following filesystems/groups:

  • Default home directories for HPC users
  • Nearline - (voth)
  • BMI
  • Meteorology (new non-iGrid spaces)
  • BIO Cheatham
  • INSCC home directories
  • CHPC staff

All chemistry & MD code users: new gromacs software; gamess updated

Posted: June 10, 2008

CHPC recently added an installation of Gromacs on arches.If you are interested, you can learn more about accessing this installation at gromacs . You will also find a link to a sample PBS script on this page.

Also, the installation of Gamess has been updated to the latest revision. See gamess for more information.

As always, if you have any problems or questions about these or any other of the chemistry code installations please contact Anita or send an email to issues@chpc.utah.edu.


CHPC Fileserver downtime: June 10th, 2008 5-7 p.m.

Posted: June 5, 2008

All file systems served off the CHPCFS file server will be down for maintenance for 2 hours on Tuesday, June 10th, 2008 from 5 until 7 p.m.

These filesystems include:
  • Default home directories for HPC users
  • Nearline - (voth)
  • BMI
  • Meteorology (new non-iGrid spaces)
  • BIO Cheatham
  • INSCC home directories
  • CHPC staff

The arches clusters will continue to run jobs, but the schedulers and resource managers will be shut down during the outage. This means you will not be able to run commands such as showq, qstat, qsub etc.

If your home directory on your desktop is one of those affected, you may want to shutdown your desktop prior to the downtime and/or reboot it after the downtime.


PVFS (/scratch/parallel) offline June 1st, 2008

Posted: May 21, 2008

***REMINDER*** On June 1st, 2008 CHPC will retire the /scratch/parallel PVFS space. This file system is currently available on the interactive nodes of the clusters only. Please take this last opportunity to move any files you may need to a permanent file system such as a departmental file server. Any files left on this space will not be available after this date.


CHPC to Retire /scratch/parallel

Posted: April 30, 2008

Beginning May 1st CHPC will begin the process of retiring the filesystem mounted at /scratch/parallel. As previously announced, we will starting removing the mount of this filesystem on the compute nodes as jobs complete over the next week. Beginning May 1st, DO NOT submit jobs utilizing /scratch/parallel.

On May 9th, 2008 any remaining mounts of /scratch/parallel will be removed from all compute nodes. The filesystem will remain available on the interactive nodes until June 1st.

On June 1st, 2008 the filesystem will be retired. Please move data you wish to keep from the /scratch/parallel filesystem prior to June 1st.

CHPC will be exploring options for parallel filesystems and may deploy something for this purpose in the future.


MPICH2 (for Ethernet upgraded on Arches to version 1.0.7

Posted: April 15, 2008

We have upgraded MPICH2 (for Ethernet) on Arches to version 1.0.7. This is a minor upgrade and as such you should be able to use old executables with the new version. If problem occurs, please, rebuild your program (no changes in Makefiles, etc required). If that does not help, please, let us know.


Arches Clusters back after failed hardware in network router

Posted: April 7, 2008

We had a supervisor card fail in our router for the clusters this morning. We were able to replace the failed hardware and get the router healthy again about 1 p.m. after which we reviewed the health of the nodes. Please let us know if you see any problems by sending email to issues@chpc.utah.edu


Spring 2008 CHPC Presentations Series begin this Thursday April 10th

Posted: April 7, 2008

All presentations are held in the INSCC Auditorium.

Overview of CHPC

Date: April 10th, 2008
Time: 10:30 a.m.

Introduction to Parallel Computing

Date: April 10th, 2008
Time: 1:00 p.m.

Chemistry Packages at CHPC

Date: April 17th, 2008
Time: 1:00 p.m.

Using Gaussian03 and Gaussview

Date: April 24th, 2008
Time: 1:00 p.m.

For more information please go to CHPC Presentations


Unscheduled Network Outage of arches clusters and telluride - 4/7/2008

Posted: April 7, 2008

Duration: Unknown

Systems Affected/Downtime Timelines: Connectivity of all of the arches clusters and telluride.

We've had a switch go down which affects connectivity to all of the arches clusters and telluride. Our staff are working on the problem. We'll send an update when we know more.


Power Restored to INSCC 4/6/2008 about 10:21 a.m.

Posted: April 6, 2008

Full power was restored to the INSCC Building about 10:21 a.m. on April 6th after the planned outage. All CHPC services are online or in the process of being verified. Some INSCC departmental services may be off-line until individual administraters are able to finish bringing up various servers.

The INSCC Rm 275 UPS (used during the outage to maintain file server connectivity to the HPC clusters) did not come back off battery at that time. CHPC staff working with electricians were able to determine and remedy the problem. As of 10:56am, the UPS was fully functional.

Campus electricians report that their work with energizing a new substation was successful.


New scratch space available

Posted: April 4, 2008

A new scratch space is available for computations on the arches clusters. The space is mounted at /scratch/serial and is 16 TB.

Please note that the initial performance of this new space may not be representative of what you'll see in the future. The first, early users may see exceptionally fast performance, but if many users move data and start running jobs, it may perform exceptionally slowly for a time. The performance should stabilize over the next few weeks.

**Important**: We will discontinue mounting the /scratch/parallel space on the compute nodes beginnning May 1st, 2008 but accesss via the interactive nodes will remain for an additional few weeks. Users should migrate all their data off /scratch/parallel as soon as possible.


Power outage this weekend affecting much of lower campus: April 5-6th

Posted: April 1, 2008

Beginning Saturday, April 5th at 11:00 p.m., the power to much of lower campus will be down in order for the electrical shop and contractors to prepare the power grid for a co-generation plant. The estimate for this outage is 24 hours.

The impacts to the INSCC building and INSCC data center (Rm 275):

Desktops and personal space: We recommend tenants of the INSCC building shut down your desktops before you leave for the weekend. We further recommend that any perishables in refrigerators be removed prior to this outage.

System administrators: All equipment in the INSCC data center should be turned off prior to 11 p.m. Saturday. There will be no air conditioning and no power available for servers.

HPC systems: The HPC systems should be able to ride out this outage since they are on another power substation within Research Park . The fileservers and supporting equipment in the SSB data center will be on backup generator power. The two data centers will interconnect via a router on generator power.


The default version of Gaussian is now the new E.01 version (and Gaussview 4).

Posted: March 20, 2008

You may remember I changed to this new version last downtime but there were issues with the use of /scratch/parallel and I moved back to the D.01 version These issues have been resolved so during this downtime I again made this latest version the standard one.

If for any reason you still want to run the D.01 version you will need to update the g03root environment in your batch script to point to it:

setenv g03root /uufs/arches/sys/pkg/gaussian03/D.01

If you have any questions/problems, please contact Anita.


Batch jobs after downtime

Posted: March 20, 2008

We have just been made aware of the fact that several jobs that had completed before the downtime are showing up in the queue and being rerun. This situation is the result moving to a new nfsroot server over the downtime specifically the process of syncing information from the old to the new one.

Based on the dates of the jobs it does not seem to be widespread, however users of the batch queues on arches should take a minute and check the jobs that they own if any of the jobs are old ones, and you do not want them to rerun you should be able to delete them using the qdel command (qdel jobid).


Arches Update

Posted: March 20, 2008

Event date: March 19, 2008

Delicatearch, Tunnelarch, Marchingmen and Telluride queues available at 8:44 p.m. on March 19, 2008
Sanddunearch queues available 12:42 a.m. on March 20, 2008
Landscapearch queues available 11:57 a.m. on March 20, 2008


CHPC DOWNTIME: Starts Tuesday March 18, 2008 at 5PM

Posted: March 12, 2008

SCOPE: HPC(Arches), Network, desktop access to filesystems

DURATION:

Network access to INSCC should be restored at about 10pm on March 18th

Desktop access to fileservers (home directory access) will be restored during the morning of March 19th - a message will be sent to users when the systems are up and ready to be used

Arches will be back up sometime later in the day on March 19th - again a message will be sent when it is ready for use. Reservations have been set so that no jobs that will not finish before 5pm March 18th will be started. Jobs waiting in the queue will be started once the downtime has finished.


CHPC Major Downtime: 3/18/2008

Posted: February 27, 2008

Duration: From 5 p.m. Tuesday 3/18 until 5 p.m. 3/19

Systems Affected/Downtime Timelines: All CHPC networks, arches, fileservers, desktops

Arches Downtime Duration: 5 p.m. 8/18 until 5 p.m. 8/19

Major CHPC Downtime

Core Infrastructure down 5 p.m on 3/18, back up by Midnight including: fileservers etc., SSB and INSCC dependencies.

Arches Cluster down 5 p.m. 3/18 until 5 p.m. 3/19


CHPC Migration to Jira

Posted: February 26, 2008

CHPC to migrate to new issue tracking system on Monday February 25th, 2008.

On Monday, February 25th, CHPC will be changing systems for tracking problems and questions. The new system will also change the terminology we use from "Problem Report" to "Issue".

You can currently get to the new system at http://jira.chpc.utah.edu and you can create your own account by going to that address. If you setup your account ahead of time (recommended) you can set your username to your uNID. We are planning to eventually have the Jira system authenticate to the campus servers, so by doing this, you will make it easy to maintain your account.

When you setup your account, make note of the email address you specify. Jira associates you to your user account by matching the "from" email address. Please setup your account to use your primary email address, that is, the account from which you will be sending Issues.

***Please note: If you send in issues from another email account, Jira will create a new account for you.

Over the weekend, we will be closing Problem Reports in the current system and moving them to jira. When we close them, we will be referencing the jira KEY for you, so you may find them in the new system. As your accounts get established, we'll be able to associate the Issues with your jira account. Please let us know if you have any questions or concerns.

We are excited about the new system and its potential to improve our service.


Change in maximum wallclock time on delicatearch

Posted: February 4, 2008

Change in maximum wallclock time on delicatearch

In order to improve utilization of the Delicatearch cluster, we have increased the maximum wall clock time for jobs from 24 hours to 72 hours (3 days). Please, take advantage of this fact to offload the very busy Sanddunearch cluster. Despite its age and slower CPUs, Delicatearch has fast Myrinet network and performance on 4 Sanddunearch nodes (16 processors) should not be any better than Delicatearch's 16 nodes (32 processors).


CHPC Batch Systems Paused: Tuesday February 5, 2008

Posted: February 4, 2008

Duration: Downtime starts at 4pm and will last about an hour.

Systems affected:

All of Arches and any computation cluster under batch control

The clusters impacted by this are: sanddunearch; delicatearch; marchingmen; tunnelarch; landscapearch and telluride.

We will be pausing the moab schedulers on all CHPC computational clusters under batch control, tomorrow, Tuesday February 5th, 2008 at 4:00 p.m., for about an hour. This is to perform system maintenance on one of our administrative systems.

Scope: This means that no new jobs will be started during this period of time. You may still queue jobs up, look at the queues and running jobs will continue to run. The clusters impacted by this are: sanddunearch; delicatearch; marchingmen; tunnelarch; landscapearch and telluride.

Please let us know if you have any questions.


CHPC DOWNTIME: Thursday January 3, 2008

Posted: December 19, 2007

Event date: January 3, 2008

Duration: Downtime starts at 3pm and will last until sometime early morning on January 4, 2008

Systems affected:

All of Arches and CHPC/INSCC Network

After this downtime all users will be using the campus uNID and password for authentication on all HPC systems (and other Linux systems admined by CHPC). Windows users will use the uNID and current password for authentication.

Arches:

All clusters will be down from 3pm to allow for updates to the OS and for the other changes outlined below. The Batch Queues will be drained of all running jobs. Reservations are in place so that jobs will not be started if they will not finish before the start of the downtime. Jobs that are queued but not running will be started after the downtime ends. The one exception to this is if you are being moved to using your unid for authentication during this downtime (see below); in this case any queued jobs you have will need to be deleted. The clusters will down until sometime the following morning.

**MIGRATION TO NEW FILESERVER: Some CHPC Users, those on the CHPC owned home directory filesystems i.e., those with home directories /uufs/inscc.utah.edu/common/home/USERID - will be migrated to a new, larger fileserver during this downtime. If you are one of these users your new home directory path will be /uufs/chpc.utah.edu/common/home/UNID

**CHANGE TO UNID: All CHPC users that are not already using their UNID as the CHPC login will be changed to doing so. If you do not have a UNID you will need to get one BEFORE this downtime. All University of Utah students and employees automatically have a UNID. But if you are a not a part of the University of Utah, you need to fill out a Person of Interest (PoI) form to get assigned a UNID. This form can be found at http://www.hr.utah.edu/forms/lib/u-affiliate-poi-form.pdf.

Network Outage:

All networking in CHPC/INSCC will be down from about 5-7pm