2005 CHPC News Announcements

Arches Clusters Down: Thursday December 29th about 2 p.m. (unexpected downtime)

Posted: December 29, 2005


updated 4 p.m. 12/29/05

Arches Clusters Down: Thursday December 29th about 2 p.m. (unexpected downtime)

Systems affected: Arches Clusters

Date: Thursday December 29th, 2005

Duration: Began in the afternoon

Scope: One of the central administration servers for the arches clusters (north window) has gone down. CHPC staff is currently working on this problem and until it is resolved the arches clusters are not available. It is not clear at this point if we will need to re-boot all of the clusters but as we learn more, we will keep you posted.

About 4 p.m. CHPC staff reported that we had a controller fail in our NFS root fileserver NorthWindow and we replaced it with a spare we keep on hand. We were able to get the machine back on, and in the network.

In an attempt to save running jobs, CHPC staff are working through each node checking and restarting services on each node.

PVFS2 /scratch/global servers had to be restarted, and we are currently working through the clusters.


Myrinet drivers upgraded on Arches (from GM to MX)

Posted: December 29, 2005

Myrinet drivers upgraded on Arches (from GM to MX)

As mentioned in separate e-mail, we will upgrade Myrinet drivers on the Arches (Delicatearch and Landscapearch) from GM to MX library on Tuesday.

This is somewhat more significant upgrade than the usual GM version upgrade we occasionally perform. MX is a completely new driver and also requires new MPI distribution. This means that users will not only have to recompile their MPI codes that use Myrinet, but, also change their Makefiles,... to reflect the location if this new MPI distro.

The MPI is called MPICH-MX. For use with GNU and Pathscale compilers, use

/uufs/$(CLUSTER)/sys/pkg/mpich-mx/std
where $(CLUSTER) is either delicatearch.arches, or landscapearch.arches.

For PGI compilers, use

/uufs/$(CLUSTER)/sys/pkg/mpich-mx/std_pgi

So, for example, to use GNU gcc with MPICH-MX, call

/uufs/$(CLUSTER)/sys/pkg/mpich-mx/std/bin/mpicc

For those users that add their own library paths to their Makefiles, adding this as a single line to LDFLAGS or similar will link MPICH-MX and the MX library:

-L/uufs/$(CLUSTER)/sys/pkg/mpich-mx/std/lib -lmpich
-L/uufs/$(CLUSTER)/sys/pkg/mx-2g/std/lib64 -lmyriexpress -Wl,-rpath=/uufs/$(CLUSTER)/sys/pkg/mx-2g/std/lib64 -lpthread

Similarly, add header file search path to CFLAGS or FFLAGS as:

-I/uufs/$(CLUSTER)/sys/pkg/mpich-mx/std/include

To run the code, use command mpirun.ch_mx. The syntax is the same as GM's mpirun.ch_gm, although not all flags of mpirun.ch_gm are implemented yet. If you use some more exotic flags, you may want to consult help by running "/uufs/$(CLUSTER)/sys/pkg/mpich-mx/std/bin/mpirun.ch_mx --help" if the feature you used with GM is available or not. In general, I don't expect anybody to get stuck here. Bottom line, run as you used to before except that changing the "gm" to "mx" in mpirun.

We have built the MX library and MPICH-MX already and encourage users to start rebuilding their codes now. Unfortunately, there's no way to test the functionality of the codes you build, but, the MPICH-MX has been tested.

We've built several commonly used programs with MPICH-MX and tested them and found most of them to improve performance by 5-10%. This is mainly due to about 1/2 reduction in message latency. There are some improvements that can be done with higher bandwidth, so, if you experience slower performance, contact us for suggestions on improvement.

The apps we tested include Amber, NAMD, DLPOLY, DLEVB and VASP. Users of those are encouraged to contact me for correct build flags.


Arches Clusters Down: Thursday December 29th about 2 p.m. (unexpected downtime)

Posted: December 29, 2005


updated 4 p.m. 12/29/05

Arches Clusters Down: Thursday December 29th about 2 p.m. (unexpected downtime)

Systems affected: Arches Clusters

Date: Thursday December 29th, 2005

Duration: Began in the afternoon

Scope: One of the central administration servers for the arches clusters (north window) has gone down. CHPC staff is currently working on this problem and until it is resolved the arches clusters are not available. It is not clear at this point if we will need to re-boot all of the clusters but as we learn more, we will keep you posted.

About 4 p.m. CHPC staff reported that we had a controller fail in our NFS root fileserver NorthWindow and we replaced it with a spare we keep on hand. We were able to get the machine back on, and in the network.

In an attempt to save running jobs, CHPC staff are working through each node checking and restarting services on each node.

PVFS2 /scratch/global servers had to be restarted, and we are currently working through the clusters.


CHPC Downtime Arches Downtime: Tuesday December 27th 8 a.m. until about Noon

Posted: December 16, 2005

CHPC Downtime Arches Downtime: Tuesday December 27th 8 a.m. until about Noon

Systems affected: Arches Clusters

Date: Tuesday December 27th, 2005

Duration: 8 a.m. until approximately Noon

Scope: Systems maintenance of the arches clusters. We will be draining the queues in anticipation of this downtime. The changes will include:

  1. Upgrading PBS/Torque to address some issues which surfaced after our 12/9/05 downtime.
  2. Update of the myrinet drivers on delicatearch and landscapearch. Please note that myrinet users will need to recompile. More details to follow.


CHPC Downtime: Tuesday December 27th 8 a.m. until about Noon

Posted: December 16, 2005

CHPC Downtime Arches Downtime: Tuesday December 27th 8 a.m. until about Noon

Systems affected: Arches Clusters

Date: Tuesday December 27th, 2005

Duration: 8 a.m. until approximately Noon

Scope: Systems maintenance of the arches clusters. We will be draining the queues in anticipation of this downtime. The changes will include:

  1. Upgrading PBS/Torque to address some issues which surfaced after our 12/9/05 downtime.
  2. Update of the myrinet drivers on delicatearch and landscapearch. Please note that myrinet users will need to recompile. More details to follow.


Arches Cluster back up and scheduling jobs

Posted: December 9, 2005

Arches Cluster back up and scheduling jobs

Arches clusters are back up and scheduling jobs as of about 8:30 p.m. (December 9th, 2005).

We urge users to recompile their programs if they use Myrinet as we have upgraded the Myrinet GM drivers. Also, PVFS2 is still finishing up its check so the I/O to it may be slightly slower till it is done somtime later tonight.

As always, e-mail problems@chpc.utah.edu if you experience difficulties.


CHPC System/Network downtime: Friday December 9th, 2005

Posted: December 9, 2005

CHPC Network/System downtime: Friday December 9th, 2005

Systems affected: All Arches clusters and switches. Software updates.

Date: Friday December 9th, 2005 at 8:00 a.m.

Duration: Until sometime Saturday December 10th

Scope: Systems maintenance of the arches clusters including upgrading the firmware on nests, upgrading PVFS. Maintenance on arches routers.

Important Notes:

  1. All /scratch space will be scrubbed. Files older than two weeks will be purged. Please move important data before this downtime.
  2. PVFS2 will be upgraded to 1.3.2 anyone using mpi-io will want to recompile their programs.
  3. GM will be upgraded to 2.0.23 - users of myrinet will need to recompile their programs.

Arches Cluster back up and scheduling jobs

Posted: December 9, 2005

Arches Cluster back up and scheduling jobs

Arches clusters are back up and scheduling jobs as of about 8:30 p.m. (December 9th, 2005).

We urge users to recompile their programs if they use Myrinet as we have upgraded the Myrinet GM drivers. Also, PVFS2 is still finishing up its check so the I/O to it may be slightly slower till it is done somtime later tonight.

As always, e-mail problems@chpc.utah.edu if you experience difficulties.


CHPC System/Network downtime: Friday December 9th, 2005

Posted: December 9, 2005

CHPC Network/System downtime: Friday December 9th, 2005

Systems affected: All Arches clusters and switches. Software updates.

Date: Friday December 9th, 2005 at 8:00 a.m.

Duration: Until sometime Saturday December 10th

Scope: Systems maintenance of the arches clusters including upgrading the firmware on nests, upgrading PVFS. Maintenance on arches routers.

Important Notes:

  1. All /scratch space will be scrubbed. Files older than two weeks will be purged. Please move important data before this downtime.
  2. PVFS2 will be upgraded to 1.3.2 anyone using mpi-io will want to recompile their programs.
  3. GM will be upgraded to 2.0.23 - users of myrinet will need to recompile their programs.

Upgrades on Arches

Posted: November 30, 2005

Upgrades on Arches

November and December are historically rich in upgrades and this year is no different.

First is the upgrade to Pathscale compilers to ver. 2.3. This includes new autoparallelization feature, better optimization selection with pathopt2 feature and few other useful things.

PGI will make their upgrade in the middle of December, which should be also interesting since they claim an improved support for dual code CPUs.

Then we have upgraded MPICH2 to version 1.0.3., so far just with GNU/Pathscale compilers. Will do the PGI build when PGI gets the upgrade. This version has numerous improvements in performance and functionality.

Finally, we upgraded Totalview to ver. 7.1. The improvements here are in memory and MPI debugging.

All were relatively minor upgrades so most users should not even notice it, but, it may be a good idea to recompile your code if you use Pathscale or MPICH2.

As always, let us know if you have any problems.


CHPC System/Network downtime: Friday December 9th, 2005

Posted: November 30, 2005

CHPC Network/System downtime: Friday December 9th, 2005

Systems affected: All Arches clusters and switches. Software updates.

Date: Friday December 9th, 2005 at 8:00 a.m.

Duration: Until sometime Saturday December 10th

Scope: Systems maintenance of the arches clusters including upgrading the firmware on nests, upgrading PVFS. Maintenance on arches routers.

Important Notes:

  1. All /scratch space will be scrubbed. Files older than two weeks will be purged. Please move important data before this downtime.
  2. PVFS2 will be upgraded to 1.3.2 anyone using mpi-io will want to recompile their programs.
  3. GM will be upgraded to 2.0.23 - users of myrinet will need to recompile their programs.

CHPC System/Network downtime: Friday December 9th, 2005

Posted: November 30, 2005

CHPC Network/System downtime: Friday December 9th, 2005

Systems affected: All Arches clusters and switches. Software updates.

Date: Friday December 9th, 2005 at 8:00 a.m.

Duration: Until sometime Saturday December 10th

Scope: Systems maintenance of the arches clusters including upgrading the firmware on nests, upgrading PVFS. Maintenance on arches routers.

Important Notes:

  1. All /scratch space will be scrubbed. Files older than two weeks will be purged. Please move important data before this downtime.
  2. PVFS2 will be upgraded to 1.3.2 anyone using mpi-io will want to recompile their programs.
  3. GM will be upgraded to 2.0.23 - users of myrinet will need to recompile their programs.

/scratch/serial is nearly full

Posted: November 24, 2005

/scratch/serial is nearly full

/scratch/serial is full. Jobs are dying when they try to write to this space. Please check your usage of space on this filesystem and clean up as much space as you can. You should also consider using the /scratch/parallel space instead - there is aver 13TB of space and all tests indicate that the this scratch system is stable and has good performance.


CHPC Presentation

Posted: November 15, 2005

Chemistry Packages at CHPC
Thursday, November 17th, 2005 at 1:30 p.m. in the INSCC Auditorium

Presenter: Anita Orendt

This talk will focus on the various computational software packages and tools that are available on CHPC computer systems – Gaussian, Gaussview, NWChem, ECCE, Amber, Molpro, Babel, Dock and Autodock. The talk is an overview and will present information on the capabilities of these packages, along with details on how users can access the various programs at CHPC and where they can get more information on these packages. This talk will serve as a precursor to the next talk in the series that focuses on Using Gaussian and GaussView.

For more information about this and other CHPC presentations, please see:

CHPC Presentations Series


/scratch/serial is nearly full (91%)

Posted: November 12, 2005

/scratch/serial is nearly full (91%)

The /scratch/serial filesystem is over 91% full. Please delete or copy off any files you can to help free up space.


/scratch/serial is nearly full

Posted: November 9, 2005

/scratch/serial is nearly full

The /scratch/serial filesystem is at 99% of its capacity. Please take a minute to look at your files on this system and clean up this space as soon as possible. Thank you,


Allocation Requests Due December 1st, 2006

Posted: November 8, 2005

Allocation Requests Due December 1st, 2006

Proposals and allocation requests for computer time on the Arches Opteron Cluster are due by December 1st, 2006. We must have this information if you wish to be considered for an allocation of time for the Winter 2006 calendar quarter and/or subsequent three quarters. If you already have an award for Winter 2006, you do not need to re-apply unless you wish to request a different amount from what you were awarded.

  1. Information on the allocation process and relevant forms are located on Allocation Policies and Allocation Form

    ****** Please use this form when sending in your request, as only those requests following this format will be considered.

  2. You may request computer time for up to four quarters.
  3. Winter Quarter (Jan-Mar) allocations go into effect on January 1, 2007.
  4. Only faculty members can request additional computer time for themselves and those working with them. Please consolidate all projects onto one proposal to be listed under the requesting faculty member.
  5. Send your proposal and relevant information to the attention of:
    Janet Ellingson, 405 INSCC
    Fax: 585-5366, e-mail: chpc-admin@utah.edu, tel: 585-3791

CHPC Presentation

Posted: November 8, 2005

Fast Parallel I/O at the CHPC
Thursday, November 10th, 2005 at 1:30 p.m. in the INSCC Auditorium

Presenter: Martin Cuma

In this talk we explain how to perform fast parallel I/O operations on the CHPC computers. It should be beneficial for all users who are interested in speeding up their parallel applications via faster file operations. First, we describe in detail PVFS2 (Parallel Virtual File System 2), installed on the Arches. Then we go over several examples on how to perform parallel I/O on this file system, in particular, MPI-I/O extension to the MPI standard and native PVFS function calls. Subsequently we detail ways how to compile and run MPI-I/O applications on both PVFS (arches) and on the Compaq Sierra's AdvFS. We conclude the talk with an insight into some more advanced aspects of MPI-I/O.

For more information about this and other CHPC presentations, please see:

CHPC Presentations Series


Tunnelarch downtime Sunday 11/13-11/20/2005

Posted: November 8, 2005

Tunnelarch downtime Sunday 11/13-11/20/2005

Systems affected: Tunnelarch

Date: Beginning Noon, Sunday, November 13th, 2005.

Duration: Until Noon, Sunday November 20th, 2005

Scope: You should also know that the tunnelarch downtime (from Sunday, November 13th at Noon until the following Sunday, November 20th at Noon) is not just a demonstration. This is to solve a very significant bioinformatics problem. Details below.

mpiBLAST on the GreenGene Distributed Supercomputer: Sequencing the NT Database Against the NT Database (An NT-Complete Problem)

Abstract: The Basic Local Alignment Search Tool (BLAST) allows bioinformaticists to characterize an unknown sequence by comparing it against a database of known sequences. The similarity between sequences enables biologists to detect evolutionary relationships and infer biological properties of the unknown sequence.

Our open-source parallel BLAST --- mpiBLAST --- decreases the search time of a 300-kB query from 24 hours to 4 minutes on a 128-processor cluster. It also allows larger query files to be compared, something which is infeasible with the current BLAST. Consequently, we propose to compare the largest query available, the entire NT database, against the largest database available, the entire NT database. The result of this comparison will provide critical information to the biology community, including insightful evolutionary, structural, and functional relationships between every sequence and family in the NT database. We estimate that the experiment will generate 100 TB of output to StorCloud.

Chair/Speaker Details:

Martin Swany (Chair)
University of Delaware

Wu Feng
Los Alamos National Laboratory

Mark Gardner
Los Alamos National Laboratory

Srinidhi Varadarajan
Virginia Tech

Jeff Crowder
Virginia Tech

Julio Facelli
University of Utah

Jeremy Archuleta
Los Alamos National Laboratory / University of Utah

Xiaosong Ma
North Carolina State University

Heshan Lin
Los Alamos National Laboratory / North Carolina State University

Venkatram Vishwanath
Los Alamos National Laboratory / University of Illinois at Chicago

Pavan Balaji
The Ohio State University


Tunnelarch downtime Sunday 11/13-11/20/2005

Posted: November 8, 2005

Tunnelarch downtime Sunday 11/13-11/20/2005

Systems affected: Tunnelarch

Date: Beginning Noon, Sunday, November 13th, 2005.

Duration: Until Noon, Sunday November 20th, 2005

Scope: You should also know that the tunnelarch downtime (from Sunday, November 13th at Noon until the following Sunday, November 20th at Noon) is not just a demonstration. This is to solve a very significant bioinformatics problem. Details below.

mpiBLAST on the GreenGene Distributed Supercomputer: Sequencing the NT Database Against the NT Database (An NT-Complete Problem)

Abstract: The Basic Local Alignment Search Tool (BLAST) allows bioinformaticists to characterize an unknown sequence by comparing it against a database of known sequences. The similarity between sequences enables biologists to detect evolutionary relationships and infer biological properties of the unknown sequence.

Our open-source parallel BLAST --- mpiBLAST --- decreases the search time of a 300-kB query from 24 hours to 4 minutes on a 128-processor cluster. It also allows larger query files to be compared, something which is infeasible with the current BLAST. Consequently, we propose to compare the largest query available, the entire NT database, against the largest database available, the entire NT database. The result of this comparison will provide critical information to the biology community, including insightful evolutionary, structural, and functional relationships between every sequence and family in the NT database. We estimate that the experiment will generate 100 TB of output to StorCloud.

Chair/Speaker Details:

Martin Swany (Chair)
University of Delaware

Wu Feng
Los Alamos National Laboratory

Mark Gardner
Los Alamos National Laboratory

Srinidhi Varadarajan
Virginia Tech

Jeff Crowder
Virginia Tech

Julio Facelli
University of Utah

Jeremy Archuleta
Los Alamos National Laboratory / University of Utah

Xiaosong Ma
North Carolina State University

Heshan Lin
Los Alamos National Laboratory / North Carolina State University

Venkatram Vishwanath
Los Alamos National Laboratory / University of Illinois at Chicago

Pavan Balaji
The Ohio State University


/scratch/parallel back: November 3rd, 2005

Posted: November 3, 2005

/scratch/parallel back: November 3rd, 2005

Systems affected: /scratch/parallel on arches clusters

Date: Thursday November 3rd, 2005

Duration: Back by about 1:45 p.m. on 12/03/2005

Scope: The update of /scratch/parallel was successful. Most of the problems that we have encountered since he last upgrade are gone, including the crippling slowness of cp, vi,...

One last problem that we noticed is that dates and permissions on files created by MPI-IO are not right, but, they don't seem to pose a problem in production runs.

Please, start using /scratch/parallel again if you've been using it before and report to us any problems you may see.


/scratch/parallel back: November 3rd, 2005

Posted: November 3, 2005

/scratch/parallel back: November 3rd, 2005

Systems affected: /scratch/parallel on arches clusters

Date: Thursday November 3rd, 2005

Duration: Back by about 1:45 p.m. on 12/03/2005

Scope: The update of /scratch/parallel was successful. Most of the problems that we have encountered since he last upgrade are gone, including the crippling slowness of cp, vi,...

One last problem that we noticed is that dates and permissions on files created by MPI-IO are not right, but, they don't seem to pose a problem in production runs.

Please, start using /scratch/parallel again if you've been using it before and report to us any problems you may see.


/scratch/parallel downtime: November 2nd, 2005

Posted: November 2, 2005

/scratch/parallel downtime: November 2nd, 2005

Systems affected: /scratch/parallel on arches clusters

Date: Thursday November 3rd, 2005

Duration: Unknown. Estimate a few hours.

Scope: We will take /scratch/parallel down tomorrow to update apply some critical patches to fix problems that made its useage difficult lately. We will NOT erase the data, only the filesystem will be inaccessible during the downtime. We don't have an estimate for the duration of the downtime, but, if all goes well it should not take more than several hours.


/scratch/parallel downtime: November 2nd, 2005

Posted: November 2, 2005

/scratch/parallel downtime: November 2nd, 2005

Systems affected: /scratch/parallel on arches clusters

Date: Thursday November 3rd, 2005

Duration: Unknown. Estimate a few hours.

Scope: We will take /scratch/parallel down tomorrow to update apply some critical patches to fix problems that made its useage difficult lately. We will NOT erase the data, only the filesystem will be inaccessible during the downtime. We don't have an estimate for the duration of the downtime, but, if all goes well it should not take more than several hours.


CHPC Presentation

Posted: October 26, 2005

Mathematical Libraries at CHPC
Thursday, October 27th, 2005 at 1:30 p.m. in the INSCC Auditorium

Presenter: Martin Cuma

In this talk we introduce the users to the mathematical libraries that are installed on the CHPC systems, which are designed to ease the programming and speed-up scientific applications. First, we will talk about BLAS, which is a standardized library of Basic Linear Algebra Subroutines, and present few examples. Then we briefly focus on other libraries that are in use, including freeware ACML, LAPACK, ScaLAPACK, PETSc and FFTW, and commercial NAG and custom libraries from Compaq.

For more information about this and other CHPC presentations, please see:

CHPC Presentations Series


CHPC /scratch/parallel problems after update

Posted: October 25, 2005

The /scratch/parallel file system upgrade was successful in the terms of fixing known problems, however, several new ones have showed up.
First of all, we have found that copying or moving files out of /scratch/parallel is very slow. Also, trying to edit files in /scratch/parallel with vi or less editors is very slow. We are in touch with PVFS2 developers trying to fix this problem. In the meanwhile, a good workaround for the copy a move problem is to use scp or tar, e.g., to move file test.file from /scratch/parallel to $HOME/data, do:
scp /scratch/parallel/$USER/test.file $HOME/data/
or
tar -cpf - test.file |tar -xvpf - -C $HOME/data/
Note that the former needs a password so it can't be used in the PBS script. Overall, we would recommend to comment out that part of script that copies files from /scratch/parallel at the end of the run and do it manually. Copying files TO /scratch/parallel works well so initial staging of input files before the run should pose no problems.
As for the editor slowness (vi, less), we recommend to edit these files on your home directory space and then copy them to /scratch/parallel prior to the run.

There is one other problem that we have noticed - occasional mess-up of the time stamps of the files. This, however, seems to happen only shortly after the file was created and so far we haven't noticed any detrimental effect on the performance and useability.


CHPC /scratch/parallel downtime: Thursday October 20th, 2005

Posted: October 12, 2005

CHPC /scratch/parallel downtime: Thursday October 20th, 2005, we will upgrade PVFS2 file system that runs /scratch/parallel. Time and duration of the downtime is to be determined.

Systems affected: Arches clusters that mount /scratch/parallel. This file system will not be available during the downtime and all user files that have been on the file system will be erased.

Details: PVFS2 that is running /scratch/parallel will be upgraded to fix several bugs that were preventing some users from using the file system efficiently.
We appeal to all users that are using /scratch/parallel now to:
1. Copy all their important files off the file system before the downtime. The complete file system will be wiped out during the upgrade.
2. Do not submit any jobs on Arches that could use /scratch/parallel during the downtime window. It would be the best to stop submitting such jobs right now, and check all the queued up jobs few days before the downtime to make sure that they are not set to use /scratch/parallel. All the jobs that will use this file system during the downtime will crash and cause prolongation of the downtime due to need to clean the affected nodes. We will not give any time refund for the time lost due to this.


CHPC /scratch/parallel downtime: Thursday October 20th, 2005

Posted: October 12, 2005

CHPC /scratch/parallel downtime: Thursday October 20th, 2005, we will upgrade PVFS2 file system that runs /scratch/parallel. Time and duration of the downtime is to be determined.

Systems affected: Arches clusters that mount /scratch/parallel. This file system will not be available during the downtime and all user files that have been on the file system will be erased.

Details: PVFS2 that is running /scratch/parallel will be upgraded to fix several bugs that were preventing some users from using the file system efficiently.
We appeal to all users that are using /scratch/parallel now to:
1. Copy all their important files off the file system before the downtime. The complete file system will be wiped out during the upgrade.
2. Do not submit any jobs on Arches that could use /scratch/parallel during the downtime window. It would be the best to stop submitting such jobs right now, and check all the queued up jobs few days before the downtime to make sure that they are not set to use /scratch/parallel. All the jobs that will use this file system during the downtime will crash and cause prolongation of the downtime due to need to clean the affected nodes. We will not give any time refund for the time lost due to this.


CHPC Network/System downtime: Thursday September 29th, 2005

Posted: September 14, 2005

updated: September 27th, 2005

updated: September 28th, 2005

CHPC Network/System downtime: Thursday September 29th, 2005, Arches cluster down at 5:00 p.m. (Komas). Some home directories will be unavailable. Expect systems to be available sometime the morning of September 30th.

Systems affected: All systems in the Komas machine room, including all Arches clusters and fileserve2 beginning at 5:00 p.m. Home directories for many users will be unavailable.

Date: Thursday September 29th, 2005

Duration: Undetermined. We will begin servicing systems in the Komas machine room at 5:00 p.m. Home directories should be available in a few hours. The Arches clusters will be available sometime the next morning.

Scope: The Arches clusters and fileserv2 will go down at 5:00 p.m. for maintenance and the cleaning and maintenance of coolers in the Komas Machine room. Power maintenance of the SSB Machine room will not require systems to go down.


CHPC Network/System downtime: Thursday September 29th, 2005

Posted: September 14, 2005

updated: September 27th, 2005

updated: September 28th, 2005

CHPC Network/System downtime: Thursday September 29th, 2005, Arches cluster down at 5:00 p.m. (Komas). Some home directories will be unavailable. Expect systems to be available sometime the morning of September 30th.

Systems affected: All systems in the Komas machine room, including all Arches clusters and fileserve2 beginning at 5:00 p.m. Home directories for many users will be unavailable.

Date: Thursday September 29th, 2005

Duration: Undetermined. We will begin servicing systems in the Komas machine room at 5:00 p.m. Home directories should be available in a few hours. The Arches clusters will be available sometime the next morning.

Scope: The Arches clusters and fileserv2 will go down at 5:00 p.m. for maintenance and the cleaning and maintenance of coolers in the Komas Machine room. Power maintenance of the SSB Machine room will not require systems to go down.


PVFS (/scratch/parallel) file system Open for General Use

Posted: August 8, 2005

PVFS (/scratch/parallel) file system Open for General Use

We would like to invite users to start using our new large parallel file system running PVFS2 (http://www.pvfs.org/pvfs2/). It consists of 12 I/O nodes with total available space about 13.5 TB.

Apart from large size, it also has much better performance than existing /scratch/serial file servers, especially when doing concurrent I/O from multiple compute nodes. In our tests, we have achieved aggregate bandwidth of 1350 MB/s reading/writing from 16 compute nodes, the theoretical peak is 1500 MB/s; /scratch/serial type servers peak is 125 MB/s.

This file system is mounted on three Arches clusters, delicatearch, marchingmen and tunnelarch as /scratch/parallel. Most users will use it similarly to /scratch/serial to do standard UNIX I/O, those interested in parallel I/O (either PVFS native or MPI-IO), please contact me directly.

Note that we have tested this file system extensively but did not have many users running on it at the same time yet so consider the file system to be in "release candidate" status and report us any problems you may encounter with it. We will try to promptly respond and alleviate the problem.

As always, let us know if you experience any problems.


Software upgrades on Arches

Posted: July 29, 2005

Software upgrades on Arches

We have upgraded Pathscale compilers to ver. 2.2, Totalview to ver. 7.0 and ACML library to ver. 2.6. All should be a smooth upgrades, except for probable need for code relink with the ACML library.

As always, let us know if you experience any problems.


SSB machine room power outage Saturday, July 30t h 7am-10am

Posted: July 27, 2005

SSB machine room power outage Saturday, July 30th 7am-10am

There will be a power outage affecting CHPC's SSB machine room this Saturday, July 30th from 7am to 10am.
Clusters housed in the room, that is Icebox, Sierra and Slickrock will be shut off, as well as most of the file servers.
We have created reservations on the Arches clusters to drain them as well since the jobs would fail when home directories housed on the downed fileservers are not available.
We will also take this opportunity to do some maintentance on fileserv2 and on new landscapearch nodes which we expect to be done by the time the power is back up.
We plan to have the Arches resuming scheduling soon after the power is back up and the fileservers are booted, while the three clusters housed in SSB may take a few more hours to come up.


SSB Maching Room Power Outage this Saturday (July 30th 7am-10am)

Posted: July 27, 2005

SSB Maching Room Power Outage this Saturday (July 30th 7am-10am)

There will be a power outage affecting CHPC's SSB machine room this Saturday, July 30th from 7am to 10am.

Clusters housed in the room, that is Icebox, Sierra and Slickrock will be shut off, as well as most of the file servers.

We have created reservations on the Arches clusters to drain them as well since the jobs would fail when home directories housed on the downed fileservers are not available.

We will also take this opportunity to do some maintentance on fileserv2 and on new landscapearch nodes which we expect to be done by the time the power is back up.

We plan to have the Arches resuming scheduling soon after the power is back up and the fileservers are booted, while the three clusters housed in SSB may take a few more hours to come up.

As always, we are sorry for the inconvenience and don't hesitate to contact us in case of any questions.


Icebox back online.

Posted: July 18, 2005

Icebox back online.

Icebox has been brought back up and is accepting jobs.


Icebox down, 9 pm Wednesday, July 13th, 2005

Posted: July 14, 2005

Icebox down, 9 pm Wednesday, July 13th, 2005

On Wednesday night, the cooler located in the center of our SSB machine room failed, and shutdown. This caused a large imbalance of cooling in the data center with the East half of the room reaching temperatures high enough to have ICEBOX hardware fail due to overheating.
CHPC staff attempted to restart the compressors on the failed unit, which was unsuccessful.
Our Liebert service provider - "Mountain Valley" was consulted that night and was scheduled to arrive between 8-9am on Thursday to work on the failed Liebert cooler.
Due to the failed Liebert cooler ICEBOX will remain down, and will be brought online and diagnosed when the cooling is restored. The availability will depend on the timeframe of the cooler repair and amount of damage suffered by the overheating.
We apologize for any inconvenience due to this failure.


Icebox down, 9 pm Wednesday, July 13th, 2005

Posted: July 14, 2005

Icebox down, 9 pm Wednesday, July 13th, 2005

On Wednesday night, the cooler located in the center of our SSB machine room failed, and shutdown. This caused a large imbalance of cooling in the data center with the East half of the room reaching temperatures high enough to have ICEBOX hardware fail due to overheating.
CHPC staff attempted to restart the compressors on the failed unit, which was unsuccessful.
Our Liebert service provider - "Mountain Valley" was consulted that night and was scheduled to arrive between 8-9am on Thursday to work on the failed Liebert cooler.
Due to the failed Liebert cooler ICEBOX will remain down, and will be brought online and diagnosed when the cooling is restored. The availability will depend on the timeframe of the cooler repair and amount of damage suffered by the overheating.
We apologize for any inconvenience due to this failure.


Use of Multiple nodes for gaussian jobs

Posted: July 5, 2005

Use of Multiple nodes for gaussian jobs

All Gaussian Users,

Some of you have been getting messages from CHPC userservices that your Gaussian jobs were not making use of all nodes in a multiple node run. To help clarify the parallel nature of Gaussian, the following information has been added to CHPC’s Gaussian 03 web page:

Parallel Gaussian:

The general rule is that SCF/DFT/MP2 energies and optimizations as well as SCF/DFT frequencies scale well when run on multiple nodes. In the case of DFT jobs, once there are more than about 100 atoms for pure DFTs or 150 atoms for hybrid methods, the default algorithm that is used (FMM – Fast Multiple Method) does not run parallel. In this case you can either run on one node or use Int=FMMNAtoms=n to turn off the use of the FMM, where n is the number of atoms.

For all other types of jobs you will want to only use one node to make efficient usage of your allocation. To run non-restartable jobs that will take longer than the queue time limits on tunnelarch, you can request access to the long queue on a case by case basis in order to be able to complete these jobs.

As always, when running a new type of job, it is best that you first test the parallel nature of the runs by running a test job on both 1 and 2 nodes and looking for the timing differences before submitting a job that uses 4, 8 or even more nodes. The timing differences are easiest to see if you use the p printing option (replace the # at the beginning of the keyword line in the input file with a #p)

If you have any questions about any of the above, or any questions regarding Gaussian 03 calculations, please feel free to contact me.

Anita


We have upgraded all the MPI distros on Arches

Posted: June 27, 2005

We have upgraded all the MPI distros on Arches, that is:
MPICH-GM to ver. 1.2.6..14b
MPICH to ver. 1.2.7
MPICH2 to ver. 1.0.2

Upgrade on Icebox will follow soon.

All are minor upgrades, but, in case at least MPICH2 I recommend source recompile.

For details how to use all these, consult our instructions page: http://www.chpc.utah.edu/docs/manuals/software/mpi.html

I would also like to take this opportunity to stress to users to use only MPICH-GM on Delicatearch to justify the purchase of the expensive Myrinet network hardware, and to shift their programs from MPICH to MPICH2. Version 1.2.7 of MPICH is likely to be the last one as the developers are abandoning it in favor of MPICH2.

As always, please, report any problems or questions to problems@chpc.utah.edu


Major CHPC Network/System Downtime, 5 pm Thursday, July 7th, 2005

Posted: June 23, 2005

Major CHPC Network/System Downtime, 5 pm Thursday, July 7th, 2005

Systems affected:All systems in the Komas machine room, and some fileservers. Home directories for many users will be unavailable.

Date:Thursday July 7th, 2005

Duration: Undetermined. Will begin 5 pm.

Scope: Repair of coolers in the Komas Machine room. All Arches clusters will be down. System maintenance on fileserve2 (home directories in /uufs/inscc.utah.edu/common/home), and several of the /scratch filesystems on Arches.


Major CHPC Network/System Downtime, 5 pm Thursday, July 7th, 2005

Posted: June 23, 2005

Major CHPC Network/System Downtime, 5 pm Thursday, July 7th, 2005

Systems affected:All systems in the Komas machine room, and some fileservers. Home directories for many users will be unavailable.

Date:Thursday July 7th, 2005

Duration: Undetermined. Will begin 5 pm.

Scope: Repair of coolers in the Komas Machine room. All Arches clusters will be down. System maintenance on fileserve2 (home directories in /uufs/inscc.utah.edu/common/home), and several of the /scratch filesystems on Arches.


CHPC email server upgrade: Things to Know

Posted: May 31, 2005

CHPC email server upgrade: Things to Know

To access previously created Email folders one needs to 'subscribe' to them. This is not required for newly created folders, only previously created folders.

With most imap clients this is done either by right clicking on any mail folder then clicking 'subcribe' or click on 'file' from the menu bar then 'subscribe'. Then select the box for the folder/subfolders you wish to view and hit 'subscribe'.

For https://webmail.chpc.utah.edu/ simply click on 'folders' then click on the ones you want to 'subscribe' to (hold down the 'Ctrl' key to select multiple folders) then click on subscribe.

The new mail server also supports secure incoming IMAPS and soon to be outgoing SMTP SSL authentication.

Pine Users

  • Your inbox messages are probably not in order. To fix this, just type "$d" to sort it again.
  • Go into your settings (from main menu type "sc" [for setup, config]) and verify:
    • smtp-server is "smtp.inscc.utah.edu"
    • inbox-path is "{mail.inscc.utah.edu/ssl/novalidate-cert}INBOX"
  • If you use ssh-tunneling for remote access, leave your smtp-server set to "localhost" . Change your inbox-path as above (the encrypted access is more secure than an ssh tunnel).

Myrinet installed on Icebox, May 17th, 2005

Posted: May 17, 2005

Myrinet installed on Icebox, May 17th, 2005

We have put Myrinet (faster network interconnect) on the first 42 nodes of Icebox (ib001-ib042). We did not have much chance to test it as all the nodes are loaded almost constantly, but, short tests work and we would like to encourage users to give it a try. It should give a good performance boost to programs with heavy communication that use more than one node.

In order to request the Myrinet nodes, include keyword myrinet to your PBS node specification line, e.g.:
#PBS -l nodes=4:ppn=2:myrinet,walltime=24:00:00

We have built MPICH-GM just with the PGI compilers (C, C++, F77/90/95). The build is located in /uufs/icebox/sys/pkg/mpich-gm/std.

That is, to compile, use:
/uufs/icebox/sys/pkg/mpich-gm/std/bin/mpiXXX, where XXX stands for cc, cxx, f77, f90.

To run, do:
/uufs/icebox/sys/pkg/mpich-gm/std/bin/mpirun.ch_gm -np $NODES -machinefile $PBS_NODEFILE ./executable


Major CHPC Network/System Downtime, 8 am - 5 pm Saturday, May 21st, 2005

Posted: May 10, 2005

Major CHPC Network/System Downtime: 8 am - 5 pm Saturday, May 21st, 2005

Systems affected: All routing in INSCC. Systems in the INSCC Machine room shutdown. All HPC systems including Arches Clusters, Sierra and Icebox.

NOTE: *** If you have equipment in the INSCC Machine room, you will need to make sure it is down before 8 a.m. the morning of May 21st.***

Date: Saturday May 21st, 2005

Duration: 8 a.m. until 5 p.m.

Scope: The coolers in the Machine room in INSCC were repaired. CHPC will take advantage of this downtime to upgrade a fileserver and some of the icebox nodes.


Major CHPC Network/System Downtime, 8 am - 5 pm Saturday, May 21st, 2005

Posted: May 10, 2005

Major CHPC Network/System Downtime: 8 am - 5 pm Saturday, May 21st, 2005

Systems affected: All routing in INSCC. Systems in the INSCC Machine room shutdown. All HPC systems including Arches Clusters, Sierra and Icebox.

NOTE: *** If you have equipment in the INSCC Machine room, you will need to make sure it is down before 8 a.m. the morning of May 21st.***

Date: Saturday May 21st, 2005

Duration: 8 a.m. until 5 p.m.

Scope: The coolers in the Machine room in INSCC were repaired. CHPC will take advantage of this downtime to upgrade a fileserver and some of the icebox nodes.


CHPC Downtime - Arches Clusters (emergency)

Posted: May 7, 2005

updated: May 7th, 2005 (about 4pm)
updated: May 7th, 2005 (about 1pm)

CHPC Downtime:
All Arches Clusters: Saturday, May 7th, 2005 at 10:00 a.m. until about 4:00 p.m.

Systems affected: All Arches clusters - KOMAS machine room cooling due to power problem.

About an 10:00 a.m. there was a power problem in the Komas machine room causing the coolers to fail. We shutdown the Arches clusters as the temperature was getting dangerously high. CHPC staff are onsight and working to get everything back online.

The coolers came back online and were cooling the machine room by about 1:00 p.m. As soon as the room was sufficiently cooled, CHPC staff will began to bring the Arches clusters back online. All went smoothly and the clusters were back and scheduling jobs by 4:00 p.m.


CHPC Downtime - Arches Clusters (emergency)

Posted: May 7, 2005

updated: May 7th, 2005 (about 4pm)
updated: May 7th, 2005 (about 1pm)

CHPC Downtime:
All Arches Clusters: Saturday, May 7th, 2005 at 10:00 a.m. until about 4:00 p.m.

Systems affected: All Arches clusters - KOMAS machine room cooling due to power problem.

About an 10:00 a.m. there was a power problem in the Komas machine room causing the coolers to fail. We shutdown the Arches clusters as the temperature was getting dangerously high. CHPC staff are onsight and working to get everything back online.

The coolers came back online and were cooling the machine room by about 1:00 p.m. As soon as the room was sufficiently cooled, CHPC staff will began to bring the Arches clusters back online. All went smoothly and the clusters were back and scheduling jobs by 4:00 p.m.


Allocation Requests Due June 1st, 2005

Posted: May 5, 2005

Allocation Requests Due June 1st, 2005

Proposals and allocation requests for computer time on the Arches Opteron Cluster are due by June 1st, 2005. We must have this information if you wish to be considered for an allocation of time for the Summer 2005 calendar quarter and/or subsequent three quarters. This is for additional computer time above the default amount given for the quarter. If you already have an award for Summer 2005, you do not need to re-apply unless you wish to request a different amount from what you were awarded.

  1. Information on the allocation process and relevant forms are located on Allocation Policies and Allocation Form

    ****** Please use this form when sending in your request, as only those requests following this format will be considered.

  2. You may request computer time for up to four quarters.
  3. Summer Quarter (Jul-Sep) allocations go into effect on July 1, 2005.
  4. Only faculty members can request additional computer time for themselves and those working with them. Please consolidate all projects onto one proposal to be listed under the requesting faculty member.
  5. Send your proposal and relevant information to the attention of:
    Victoria Volcik, 405 INSCC
    Fax: 585-5366, e-mail: admin@chpc.utah.edu, tel: 585-3791

Sierra cluster not fully functional

Posted: May 2, 2005

updated May 3rd, 2005

Sierra cluster not fully functional (back by about 4pm)

The sierra cluster had some troubles the afternoon of May 2nd. CHPC staff looked into the problem and it was returned to normal operation about 4:00 p.m. We apologize for the inconvenience.


Sierra cluster not fully functional

Posted: May 2, 2005

updated May 3rd, 2005

Sierra cluster not fully functional (back by about 4pm)

The sierra cluster had some troubles the afternoon of May 2nd. CHPC staff looked into the problem and it was returned to normal operation about 4:00 p.m. We apologize for the inconvenience.


Changes to NBO version in Gaussian installation

Posted: April 27, 2005

Changes to NBO version in Gaussian installation

Note to all users of Gaussian on arches cluster: The standard installation of NBO within Gaussian03 has been changed from the 3.1 that is shipped from Gaussian to the newest 5.0 version of NBO available. For more information on the added features of this newest version of NBO see http://www.chem.wisc.edu/~nbo5/


Power down in Komas building (4/20/05 from 4:30-6:00pm): Arches cluster down (until approx. 8:30pm), all routing for INSCC down.

Posted: April 20, 2005

Power down in Komas building (4/20/05 from 4:30-6:00pm): Arches cluster down (until approx. 8:30pm), all routing for INSCC down.

The power dropped in the Komas building in Research park about 4:30pm on April 19th, 2005. This power outage dropped all of the arches clusters. This power outage also dropped all the routing for the INSCC building. A few weeks ago, CHPC Network staff had to move all of the INSCC routing to Komas as a workaround for a code bug. The fix to the code bug is still outstanding. All routing between networks for the Komas cluster machine room, SSB machine room, INSCC machine room and INSCC building was non-functional until the power returned about 6:00 pm. CHPC Network and Systems staff were on site and working with electricians and UP&L to return power as soon as possible. The arches clusters were back scheduling jobs around 8:30 pm.


Power down in Komas building (4/20/05 from 4:30-6:00pm): Arches cluster down (until approx. 8:30pm), all routing for INSCC down.

Posted: April 20, 2005

Power down in Komas building (4/20/05 from 4:30-6:00pm): Arches cluster down (until approx. 8:30pm), all routing for INSCC down.

The power dropped in the Komas building in Research park about 4:30pm on April 19th, 2005. This power outage dropped all of the arches clusters. This power outage also dropped all the routing for the INSCC building. A few weeks ago, CHPC Network staff had to move all of the INSCC routing to Komas as a workaround for a code bug. The fix to the code bug is still outstanding. All routing between networks for the Komas cluster machine room, SSB machine room, INSCC machine room and INSCC building was non-functional until the power returned about 6:00 pm. CHPC Network and Systems staff were on site and working with electricians and UP&L to return power as soon as possible. The arches clusters were back scheduling jobs around 8:30 pm.


We have upgraded Pathscale compilers on Arches and Icebox to version 2.1.

Posted: April 19, 2005

We have upgraded Pathscale compilers on Arches and Icebox to version 2.1.

Among the most noticeable improvements:

  • OpenMP 2.0 support in Fortran and C and limited support in C++
  • Improved performance with changes in scalar and vector math library routines
  • New options that help debugging
    • trapuv sets uninitialized variables to NAN which helps to crash the code when the value is used
    • Wuninitialized gives a warning when an uninitialized variable is used (though it also gives warning in conditional statements)

Totalview has been upgraded on Arches to version 6.8.

This version mainly increases number of supported platforms and compilers and also has some improvements in MPI debugging interface.

As always, let us know if you experience any problems.


CHPC Presentation Series

Posted: April 11, 2005

Parallel performance Analysis with TAU
Thursday, April 13th, 2006 at 1:30 p.m. in the INSCC Auditorium

Presenter: Martin Cuma

TAU (Tuning and Analysis Utilities, http://www.cs.uoregon.edu/research/tau/home.php) is a profiling and tracing toolkit for performance analysis of serial and parallel programs. In this talk, we will introduce TAU as a new and flexible tool for tracing of parallel programs on CHPC Arches clusters. We detail small changes necessary to turn on the tracing and then explain how to visualize the trace files in Vampir trace viewer. We will conclude with some specific examples and glimpse on other features that TAU provides.

For more information about this and other CHPC presentations, please see:

CHPC Presentations Series


HPC home directory file server crashes

Posted: April 9, 2005

updated: April 11th, 2005

HPC home directory file server crashes three times over the, April 9-10th, 2005 weekend

fileserv2, that hosts user's home directories had three crashes over the April 9-10th, 2005 weekend. As a result of the downtime, jobs of those people whose homes are hosted by this server (i.e. those who don't have their own departmental fileservers) and write there have either crashed or spent a lot of time waiting for the I/O and may have run out of walltime as a result. If the job was writing data to the scratch servers and was supposed to copy them back to home at the end, the data may not have been copied. Some jobs that were trying to start during the fileserver downtime could not find the data and could send lots of spam to the user's e-mail box. If there was a write at the time of the crash, the data may be corrupt. Please, check your results that you obtained during the weekend with extra care.


HPC home directory file server crashes

Posted: April 9, 2005

updated: April 11th, 2005

HPC home directory file server crashes three times over the, April 9-10th, 2005 weekend

fileserv2, that hosts user's home directories had three crashes over the April 9-10th, 2005 weekend. As a result of the downtime, jobs of those people whose homes are hosted by this server (i.e. those who don't have their own departmental fileservers) and write there have either crashed or spent a lot of time waiting for the I/O and may have run out of walltime as a result. If the job was writing data to the scratch servers and was supposed to copy them back to home at the end, the data may not have been copied. Some jobs that were trying to start during the fileserver downtime could not find the data and could send lots of spam to the user's e-mail box. If there was a write at the time of the crash, the data may be corrupt. Please, check your results that you obtained during the weekend with extra care.


Moab Configuration Changes on Arches

Posted: April 6, 2005

Moab Configuration Changes on Arches

CHPC staff have made several changes to the scheduler configuration to moab on delicatearch, marchingmen and tunnelarch. These changes are an experiment to see if it improves throughput. To summarize, the changes include:

  1. Turning off the "fairshare". The effect of this is that it will be much more "fifo-ish" (first-in-first-out) than before.
  2. The priority reward for parallelism has been turned off.
  3. We've changed the backfill method from "FIRSTFIT" to "BESTFIT". This means that very small, very short jobs should get through more quickly and that backfill will be less "fifo-ish".
  4. We will no longer scrub /tmp on nodes during the job epilogue. This means it is very important to cleanup after your job ends, removing any scratch files. This is easy to do by adding a line to the end of your PBS script like:

    cp * $HOME/working_directory && cd .. && rm -rf /tmp/$PBS_JOBID

    This should also be done on any scratch filesystem you may use (/scratch/serial, /scratch/da, /scratch/mm). If you are writing large scratch files to /tmp and your job crashes before it gets to this cleanup stage, let us know so we can cleanup the nodes.


CHPC Presentation

Posted: April 5, 2005

Chemistry Packages at CHPC
Thursday, April 7th, 2005 at 1:30 p.m. in the INSCC Auditorium

Presenter: Anita Orendt

This talk will focus on the computational chemistry software packages - Gaussian, Amber, NWChem, Molpro, Amica, Babel, GaussView, ECCE - that are available on CHPC computer systems. The talk will be an overview of the packages and their capabilities, and will focus on details of how users can access the installations at CHPC. This talk is the precursor for a second talk scheduled for April 21st that will focus on the use of Gaussian 03 and GaussView.

For more information about this and other CHPC presentations, please see:

CHPC Presentations Series


All machines at Komas down at 5pm, up at 10.30pm, 31st March

Posted: March 31, 2005

All machines at Komas down at 5pm, up at 10.30pm, 31st March

Shortly before 5pm, March 31st, all connectivity to machines housed in Komas machine room (Arches clusters) was dropped. The TS Electric people & Randy Green reported that UP&L had suffered a power bump on the hogle zoo substation. This resulted in our flywheel and ups triggering and the loss of power to the bulk of the Arches compute nodes. TS Electric gave us a green light to bring the system back up at 7:30pm. The system came back up without any real problems and resumed scheduling at about 10.30pm. All jobs that were running were lost. Jobs that were in the idle state in the queues have started to run and the system is open for regular.


Arches Cluster unscheduled downtime

Posted: March 31, 2005

Arches Cluster unscheduled downtime Thursday, March 31st, 2005 about 5:00 p.m.

At approx 10:30pm Thursday we started up the schedulers again after bringing Arches back up. The TS Electric people & Randy Green reported that UP&L had suffered a power bump on the hogle zoo substation. This resulted in our flywheel and ups triggering and the loss of power to the bulk of the Arches compute nodes. TS Electric gave us a green light to bring the system back up at 7:30pm. The system came back up without any real problems. Clearly all jobs that were running were lost. Jobs that were in the idle state in the queues have started to run and the system is open for regular use at this time. Thanks for your patience.


PGI compilers upgraded to version 6.0

Posted: March 30, 2005

PGI compilers upgraded to version 6.0

Among new features that should be beneficial to our users are:

  • Fortran 95 support - new compiler pgf95, which superceeds pgf90 (pgf90 retained for backward compatibility).
  • improved performance
  • large array support (> 2 GB)

For more details see release notes: http://www.pgroup.com/doc/pgicdkrn.pdf

We have done some quick tests that reveal that this version's libraries are compatible with previous, 5.2, which means we did not recompile any of the packages that are used on the top of PGI compilers (MPICH,...). This means also that recompiling codes is not necessary, but I'd still recommend it given the claimed performance improvements.

Please, let us know if you experience a problem with this, or anything else.


Arches Cluster Back Online

Posted: March 23, 2005

Arches Cluster Back Online Wednesday, March 23rd, 2005 about 6:00 p.m.

The arches clusters are now open to users and scheduling jobs after the extended downtime. Two changes of note which were made during this downtime are:

  • All nodes/servers are running the newer kernel, 2.6.11.
  • GM was upgraded to 2.0.19 on delicatearch and landscapearch.

Thanks for your patience on this extended downtime. We had some unexpected delays and apologize for the inconvenience. Please let us know if you run into problems by sending a report to problems@chpc.utah.edu.


Arches Cluster Back Online

Posted: March 23, 2005

Arches Cluster Back Online Wednesday, March 23rd, 2005 about 6:00 p.m.

The arches clusters are now open to users and scheduling jobs after the extended downtime. Two changes of note which were made during this downtime are:

  • All nodes/servers are running the newer kernel, 2.6.11.
  • GM was upgraded to 2.0.19 on delicatearch and landscapearch.

Thanks for your patience on this extended downtime. We had some unexpected delays and apologize for the inconvenience. Please let us know if you run into problems by sending a report to problems@chpc.utah.edu.


CHPC Downtime - KOMAS Machine room (extended)

Posted: March 21, 2005

updated: March 23rd, 2005

CHPC Downtime: Monday, March 21st, 2005 at 10:00 p.m.

Systems affected: All Arches clusters - KOMAS machine room critical repair of cooling system.

Date: Monday March 21st - Wednesday March 23rd (extended from 3/22)

Duration: Availability expected sometime Wednesday March 23rd after replacement and testing of cooling system. (Our systems staff ran into unexpected problems during the nfsroot image upgrade and as a result the downtime was extended beyond the original estimate.

Details: A critical repair of the cooling system will take place beginning at 10:00 pm tonight, Monday March 21st. CHPC will take advantage of this opportunity and move up most of the maintenance planned for the March 31st scheduled downtime which has been cancelled. We apologize for any inconvenience.


CHPC Downtime - KOMAS Machine room (extended)

Posted: March 21, 2005

updated: March 23rd, 2005

CHPC Downtime: Monday, March 21st, 2005 at 10:00 p.m.

Systems affected: All Arches clusters - KOMAS machine room critical repair of cooling system.

Date: Monday March 21st - Wednesday March 23rd (extended from 3/22)

Duration: Availability expected sometime Wednesday March 23rd after replacement and testing of cooling system. (Our systems staff ran into unexpected problems during the nfsroot image upgrade and as a result the downtime was extended beyond the original estimate.

Details: A critical repair of the cooling system will take place beginning at 10:00 pm tonight, Monday March 21st. CHPC will take advantage of this opportunity and move up most of the maintenance planned for the March 31st scheduled downtime which has been cancelled. We apologize for any inconvenience.


Seminar: "Some Key Ideas in High Performance Computing"

Posted: March 20, 2005

Seminar: "Some Key Ideas in High Performance Computing"
Wednesday, March 30th, 2005 at 4:15 p.m. LCR MEB 3147
School of Computing

Presenter: Rob Leland
Deputy Director
Computers, Computation, Informatics and Mathematics
Sandia National Laboratories

High performance computing promises to transform the scientific and engineering disciplines. The speaker will reflect on several key ideas that have emerged over the past two decades in the evolution of distributed memory high performance computing. These will be illustrated with examples in geometry and meshing, load balancing, linear solvers and the design and construction of a new supercomputing system, Red Storm. Discussion on the history and future of high performance computing will be invited.


CHPC Presentation

Posted: March 20, 2005

Introduction to Parallel Computing"
Thursday, March 31st, 2005 at 1:30 p.m. in the INSCC Auditorium

Presenter: Martin Cuma

In this talk, we will first discuss various parallel architectures and note which ones are represented at the CHPC, in particular, shared and distributed memory parallel computers. A very short introduction into two programming solutions for these machines, MPI and OpenMP, will then be given followed by instructions on how to compile, run, debug and profile parallel applications on the CHPC parallel computers. Although this talk is more directed towards those starting to explore parallel programming, more experienced users can gain from the second half of the talk, that will provide details on software development tools available at the CHPC. This presentation gives users new to CHPC, or interested in High Performance Computing an overview of the resources available at CHPC, and the policies and procedures to access these resources.

For more information about this and other CHPC presentations, please see:

CHPC Presentations Series


CHPC Presentation

Posted: March 20, 2005

Overview of CHPC
Thursday, March 24th, 2005 at 1:30 p.m. in the INSCC Auditorium

This presentation gives users new to CHPC, or interested in High Performance Computing an overview of the resources available at CHPC, and the policies and procedures to access these resources.

Topic covered will include:

  • The platforms available
  • Filesystems
  • Access and security
  • An overview of the batch system and policies

All welcome!

For more information about this and other CHPC presentations, please see:

CHPC Presentations Series


Sierra Cluster rebooted, approx 3:45 pm March 16th, 2005

Posted: March 16, 2005

Sierra Cluster rebooted, approx 3:45 pm March 16th, 2005

On the afternoon of March 16, 2005, the sierra cluster had some system problems causing some jobs to die. We rebooted the system about 3:45 pm.


Sierra Cluster rebooted, approx 3:45 pm March 16th, 2005

Posted: March 16, 2005

Sierra Cluster rebooted, approx 3:45 pm March 16th, 2005

On the afternoon of March 16, 2005, the sierra cluster had some system problems causing some jobs to die. We rebooted the system about 3:45 pm.


MPICH2 updated to version 1.0.1

Posted: March 15, 2005

MPICH2 updated to version 1.0.1

MPICH2 was updated to version 1.0.1. we did not have much chance to test it because all the machines are quite loaded, but, we don't expect any problems since it's a minor update. Please let us know of any problems.


CHPC Presentation Series starts March 24th, 2005

Posted: March 10, 2005

CHPC Presentation Series starts March 24th, 2005

CHPC is happy to annouce that a selection of our regular Fall series will be presented this Spring.

All presentations will be held on Thursdays in the INSCC Auditorium at 1:30 p.m.

  • 3/24 Overview of CHPC
  • 3/31 Introduction to Parallel Computing
  • 4/7 Chemistry packages at CHPC
  • 4/14 *No presentation*
  • 4/21 Using Gaussian 03 and Gaussview
  • 4/28 Introduction to programming with MPI
  • 5/5 Debugging with Totalview

New Scratch space available on marchingmen and delicatearch

Posted: March 9, 2005

New Scratch space available on marchingmen and delicatearch

We have installed two new NFS scratch file servers, /scratch/mm and /scratch/da. The former is mounted on all MM (marchingmen) compute nodes, the latter on all DA (delicatearch) compute nodes. We encourage users to start using the new file servers for their jobs on MM and DA. Both have updated kernel that has fixed many NFS related bugs that were causing failures in /scratch/serial.

MM has also has an updated kernel on the compute nodes, while DA does not. We have not thoroughly tested the new /scratch/da server interaction with the older kernel DA compute nodes, so, we would appreciate users reporting anything they may observe odd with this setup. Still, since /scratch/da is running a newer kernel, from a primitive statistical standpoint, one could speculate 50% reduction of problems that were plaguing /scratch/serial.

/scratch/serial was not updated. It is still the only scratch file server available to TA and LA. Those who never experienced any problem with it may keep using it, but we recommend those who did should use MM or DA over the next couple of weeks.


Downtime: marchingmen cluster, Tuesday March 8th, 2005 from 8am

Posted: March 2, 2005

updated: March 9, 2005
updated: March 8, 2005

Downtime: marchingmen cluster, Tuesday March 8th, 2005 from 8am
Resumed user access and scheduling jobs about 9am the morning of March 9th, 2005.

Details: The kernel was upgraded on the compute nodes and another scratch filesystem added which will be visible only to the marchingmen compute and interactive nodes. The new scratch space is available at /scratch/mm. Additional scratch space was added to delicatearch at the same time which did not require a downtime. This space is available at /scratch/da


Downtime: marchingmen cluster, Tuesday March 8th, 2005 from 8am

Posted: March 2, 2005

updated: March 9, 2005
updated: March 8, 2005

Downtime: marchingmen cluster, Tuesday March 8th, 2005 from 8am
Resumed user access and scheduling jobs about 9am the morning of March 9th, 2005.

Details: The kernel was upgraded on the compute nodes and another scratch filesystem added which will be visible only to the marchingmen compute and interactive nodes. The new scratch space is available at /scratch/mm. Additional scratch space was added to delicatearch at the same time which did not require a downtime. This space is available at /scratch/da


Icebox outage, morning of February 25, 2005

Posted: February 25, 2005

updated: March 1, 2005

Icebox outage, morning of February 25, 2005

On the morning of February 25, 2005, icebox, the IA-32 cluster, had some major system problems which required us to take it offline. Our systems staff worked to solve this recurring problem and the system was opened for users again the morning of March 1st, 2005. We apologize for any inconvenience.


Icebox outage, morning of February 25, 2005

Posted: February 25, 2005

updated: March 1, 2005

Icebox outage, morning of February 25, 2005

On the morning of February 25, 2005, icebox, the IA-32 cluster, had some major system problems which required us to take it offline. Our systems staff worked to solve this recurring problem and the system was opened for users again the morning of March 1st, 2005. We apologize for any inconvenience.


Icebox outage, morning of February 24, 2005

Posted: February 24, 2005

Icebox outage, morning of February 24, 2005

On the morning of February 24, 2005, icebox, the IA-32 cluster, had some major system problems which required us to take it offline. It was back scheduling jobs by about 11:40 am. We apologize for any inconvenience.


Icebox outage, morning of February 24, 2005

Posted: February 24, 2005

Icebox outage, morning of February 24, 2005

On the morning of February 24, 2005, icebox, the IA-32 cluster, had some major system problems which required us to take it offline. It was back scheduling jobs by about 11:40 am. We apologize for any inconvenience.


Icebox outage, morning of February 23, 2005

Posted: February 23, 2005

Icebox outage, morning of February 23, 2005

On the morning of February 23, 2005, icebox, the IA-32 cluster, had some major system problems and required us to reboot most of the system. Icebox has returned to scheduling by about 11am that morning. We apologize for any inconvenience.


Totalview debugger upgrade on Arches clusters

Posted: February 23, 2005

Totalview debugger upgrade on Arches clusters

We have upgraded Totalview debugger to version 6.7 on Arches clusters. Main improvement is extended memory debugging. For details, see http://www.etnus.com/TotalView/Latest_Release.html

Note that we have discontinued license renewal for Totalview on Icebox so the latest release there remains 6.6. Since we have 4 user license there (for ia32) which is hardly used now, if someone affiliated with CHPC with Linux ia32 desktop is interested, we may consider sharing the license. In that case, please, contact me.


Icebox outage, morning of February 23, 2005

Posted: February 23, 2005

Icebox outage, morning of February 23, 2005

On the morning of February 23, 2005, icebox, the IA-32 cluster, had some major system problems and required us to reboot most of the system. Icebox has returned to scheduling by about 11am that morning. We apologize for any inconvenience.


Matlab installed on Arches and Icebox

Posted: February 8, 2005

Matlab installed on Arches and Icebox

We've obtained full license for Matlab and it is now installed on both Arches and Icebox. For details how to use it, please, see: http://www.chpc.utah.edu/docs/manuals/software/matlab.html

In there, please, note that we don't recommend to run Matlab on interactive nodes, but rather submit an interactive PBS job and run Matlab on the compute nodes. For details, see the aforementioned webpage.

As always, please, let us know if you experience any problems.


Icebox Available February 2nd, 2005

Posted: February 2, 2005

Icebox Available February 2nd, 2005

The structure and administration of the system is very similar to the arches configuration. Example scripts from arches should work with minor modifications. To get proper paths you will want to get newer versions of the chpc startup files. These can be found on our web page:

chpc.tcshrc
(for csh and tcsh)
chpc.bashrc
(for bash)

Major CHPC Downtime: From 5 pm on 2/3/05 until aout 4 am on 2/4/05.

Posted: January 27, 2005

updated February 4th, 2005
re-
posted: January 12, 2005

Major CHPC Downtime - All HPC systems and INSCC Networking From 5:00 pm 2/3/05 - about 4 am Friday 2/4/05.

Systems affected: All networking in the INSCC building, SSB machine room and Komas machine room. All of the HPC systems including the Arches Clusters: marchingmen, delicatearch, tunnelarch and landscapearch; icebox and sierra. There will be a router upgrade at this time as well.

Date: Beginning 5:00 pm on Thursday February 3rd, 2005

Duration: Until about 4 am on Friday Februarh 4th, 2005

Details: There will be a power outage at Komas. All networking will be down for the INSCC building, SSB machine room and Komas machine room. Home directories on HPC systems with a path of /uufs/inscc.utah.edu/common/home plan to be moved to a new server with policy changes. see Migration of /uufs/inscc.utah.edu/common/home to new fileserver.


Major CHPC Downtime: From 5 pm on 2/3/05 until aout 4 am on 2/4/05.

Posted: January 27, 2005

updated February 4th, 2005
re-
posted: January 12, 2005

Major CHPC Downtime - All HPC systems and INSCC Networking From 5:00 pm 2/3/05 - about 4 am Friday 2/4/05.

Systems affected: All networking in the INSCC building, SSB machine room and Komas machine room. All of the HPC systems including the Arches Clusters: marchingmen, delicatearch, tunnelarch and landscapearch; icebox and sierra. There will be a router upgrade at this time as well.

Date: Beginning 5:00 pm on Thursday February 3rd, 2005

Duration: Until about 4 am on Friday Februarh 4th, 2005

Details: There will be a power outage at Komas. All networking will be down for the INSCC building, SSB machine room and Komas machine room. Home directories on HPC systems with a path of /uufs/inscc.utah.edu/common/home plan to be moved to a new server with policy changes. see Migration of /uufs/inscc.utah.edu/common/home to new fileserver.


Migration of /uufs/inscc.utah.edu/common/home to new fileserver February 3rd, 2005

Posted: January 26, 2005

Migration of /uufs/inscc.utah.edu/common/home to new fileserver February 3rd, 2005

We are planning to migrate to a new fileserver for home directories on our HPC systems during the downtime February 3rd, 2005. This will affect you only if your home directory is currently being served on the HPC systems (arches clusters, icebox and sierra) out of the /uufs/inscc.utah.edu/common/home filesystem.

We will be making the following changes to our policies for this space:

  1. We will no longer backup this data. Any critical data in this space should be moved to your home department for permanent storage.
  2. We will be putting quotas on the usage of the new filesystem. The default quota will be 1 GB. If you need a larger quota, please send a request to problems@chpc.utah.edu.
  3. We will discontinue charging for the disk usage in this space.

We plan on leaving the data on the current fileserver for a few weeks in the event the new fileserver has any problems. If you have any questions or concerns about these changes, please let us know.


Skyline Arch Available

Posted: January 21, 2005

Skyline Arch Available

The visualization portion of the Arches cluster "Skyline Arch", located in 294 INSCC, is now available to our user community. This resource is comprised of 10 - Dual Opteron processors with Nvidia Quadro FX 3000 graphics cards running 18 - Sanyo LCD projectors to create a stereo 3 x 3 tiled display. The tiling application we are using is Chromium. It intercepts OpenGL calls for some application and distributes them to the appropriate display. This wall can potentially be used with any OpenGL visualization application.

Currently the OpenGL application must understand "quad buffered stereo" in order to utilize the stereo abilities of the wall. We are working on some Chromium options which would allow us to force the OpenGL application to create stereo images. VMD (Visual Molecular Dynamics) is the only OpenGL application we have thoroughly tested with the wall so far. In the works are Paraview (a volume visualization application), Mercury and Arima. Also available is NPB (NCSA Pixel Blaster) which allows the viewing of pre-rendered movies: such as .avi or a sequence of images.

If you like to take advantage of this resource please contact Sam Liston (stliston@chpc.utah.edu).


IA-32 Cluster (icebox) to be available soon

Posted: January 19, 2005


updated: January 28, 2005

IA-32 Cluster (icebox) to be available soon

Icebox, the IA-32 cluster will be available for use again sometime the week of January 31st, 2005 but there will be some significant changes in the configuration:

  • Only approximately 100 nodes will be brought online
  • All of the 100 or so nodes will be dual procs
  • The notion of "ownership" of nodes will no longer apply
  • There will be no allocations

The motivation for these changes is that all of the nodes are getting older and are out of warranty. We want to bring up the system to get any useful life out of the better nodes, but without a lot of complexity. Please let us know if you have questions about this.


Major CHPC Downtime: February 3rd, 2005 from 5 pm - midnight. All systems and networks.

Posted: January 12, 2005

Major CHPC Downtime - All HPC systems and INSCC Networking From 5:00 pm - midnight Thursday 2/3/05.

Systems affected: All networking in the INSCC building, SSB machine room and Komas machine room. All of the HPC systems including the Arches Clusters: marchingmen, delicatearch, tunnelarch and landscapearch; icebox and sierra. There will be a router upgrade at this time as well.

Date: Beginning 5:00 pm on Thursday February 3rd, 2005

Duration: Approximately 7 hours

Details: There will be a power outage at Komas. All networking will be down for the INSCC building, SSB machine room and Komas machine room.


Major CHPC Downtime: February 3rd, 2005 from 5 pm - about 4am February 4th, 2005. All systems and networks.

Posted: January 12, 2005

Major CHPC Downtime - All HPC systems and INSCC Networking From 5:00 pm 2/4/05 - 4:00 am 2/5/05.

Systems affected: All networking in the INSCC building, SSB machine room and Komas machine room. All of the HPC systems including the Arches Clusters: marchingmen, delicatearch, tunnelarch and landscapearch; icebox and sierra.

Date: Beginning 5:00 pm on Thursday February 3rd, 2005

Duration: Lasted until about 4:00 am on Friday February 4th, 2005

Details: There will be a power outage at Komas. All networking will be down for the INSCC building, SSB machine room and Komas machine room.


Pathscale compilers upgraded to version 2.0 on Arches clusters

Posted: January 11, 2005

Pathscale compilers upgraded to version 2.0 on Arches clusters

We have upgraded Pathscale compilers to version 2.0 on all Arches clusters. The upgrade includes OpenMP Fortran support and Pathdb debugger. For details, see http://www.pathscale.com/pr_011105.html

We also upgraded Intel Trace Collector library on DelicateArch (MPI Profiling tool) to version 5.0. For details how to use this tool, see http://www.chpc.utah.edu/docs/manuals/software/vampir.html

As always, please, let us know if you encounter any problems.


Unscheduled Downtime: Arches Clusters

Posted: January 8, 2005

Unscheduled Downtime - Arches Clusters From approximately 10:00 am Saturday 1/8/05 until approximately 6:00 pm Sunday 1/9/05

Due to a Cooler Failure in the Komas Machine Room

Arches Clusters DOWNED about 10:00 am Saturday January 8th, 2005

Arches Clusters UP about 6:00 pm Sunday January 9th, 2005

Systems affected: All of the Arches Clusters including: marchingmen, delicatearch, tunnelarch and landscapearch

Date: Beginning 10:00 am on Saturday 1/8/05

Duration: Until 6:00 pm on Sunday 1/9/05.

Details: The machine room at Komas was dangerously overheating. CHPC staff shut down all system in the machine room to prevent equipment damage. The cooler was repaired and the room was returned to normal operating temperatures.


Unscheduled Downtime: Arches Clusters

Posted: January 8, 2005

Unscheduled Downtime - Arches Clusters From approximately 10:00 am Saturday 1/8/05 until approximately 6:00 pm Sunday 1/9/05

Due to a Cooler Failure in the Komas Machine Room

Arches Clusters DOWNED about 10:00 am Saturday January 8th, 2005

Arches Clusters UP about 6:00 pm Sunday January 9th, 2005

Systems affected: All of the Arches Clusters including: marchingmen, delicatearch, tunnelarch and landscapearch

Date: Beginning 10:00 am on Saturday 1/8/05

Duration: Until 6:00 pm on Sunday 1/9/05.

Details: The machine room at Komas was dangerously overheating. CHPC staff shut down all system in the machine room to prevent equipment damage. The cooler was repaired and the room was returned to normal operating temperatures.


MPICH2 1.0 installed on Arches

Posted: January 5, 2005

MPICH2 1.0 installed on Arches

We have installed the new full release of MPICH 2 on all Arches clusters. This is a full release that has full MPI2 support. Apart from this, it includes some optimizations for global communication operations and separate module for shared memory communication with vastly improved performance in SMP nodes - with respect to MPICH 1.x.

Full Report...