CHPC News Spring 1997, Vol 8, #2
Introduction to SP Short Course Series
CHPC will present a series of Short Courses on the new IBM SP system in late April or early May. The series will consist of the following courses:
- Introduction to the SP
- Introduction to MPI (Message Passing Interface)
- Converting Serial Codes to a Distributed Memory Environment.
Please watch the system messages or contact us (phone: 581- 4439, email: firstname.lastname@example.org") if you are interested in attending and for more information.
***Reminder Summer Quarter Proposals are due June 1st.
by Julio Facelli, Directory
In the last few months a number of changes have occurred at the CHPC. In November President Smith approved the creation of a new subunit within the CHPC. This subunit the DSNSG (Distributed Systems and Network Support Group) will explore advanced networking services developing network test beds within the University and provide second level support for UNIX managers across campus (see the ANL article). The budget approved for this unit calls for a significant staff expansion and we have been quite busy recruiting new people. We also have replaced some staff people that have left CHPC.
On the systems front we have been able to expand our SP system to 64 nodes (see the following article) and we are actively exploring the acquisition of other large systems in the near future.
We are scheduled to move to the new INSCC this summer, which is just north of the Student Services Building. Look on our "motd" for further information on our move.
Finally let me comment on the batch queues, yes, I hate them too, but this is the only way to maximize the utilization of resources. The queues are not to inconvenience your work but to coordinate your needs with those of other researchers. The queues are not set in concrete and can be changed or modified to meet our users requirements. Minor adjustments can be authorized by myself, but changes that significantly affect the operation of the systems need approval from the Faculty Advisory Board.
We are committed to facilitate your work, so let us know what changes in our queue policies will make your work easier. Please do not ask for a queue giving all the processor, memory and disk and no time limit. It would be nice, but remember there are other users of our systems. We appreciate your cooperation.
by Janet Curtis and Julia Caldwell
In January of this year CHPC installed a 64-node IBM SP System. This distributed memory machine is still in friendly user mode, and we expect to open it for more general use in the near future. Following is a summary of the major components of the system:
- 8 nodes with 67 MHz Power2 chip, 128 MB RAM, 2.2 GB system disk, built-in ethernet (for control network), TB3 switch adapter. (Original eight nodes with updated switch).
- 56 nodes with 120 MHz Power2 SuperChip, 2.2 GB system
disk, built-in ethernet, TB3 switch adapter.
- 4 nodes have 1 GB of RAM
- 2 nodes have 512 MB of RAM
- 4 nodes have 256 MB of RAM
- 46 nodes have 128 MB of RAM
These 64 nodes are installed in 4 tower enclosures, 16 per tower. One node in each tower also has a FDDI adapter for connection to the outside world. Each node is configured with 800 Mbytes of local /scratch disk.
The high performance switch provides data paths between nodes. It is a packet-switched omega network and uses a buffered wormhole routing strategy. It supports bi-directional and any-to-any data transfers. The peak bandwidth is 150 MB/second. The chip latency is 1.8 microseconds for our configuration. The switch adapter employs an PowerPC processor as a communication coprocessor.
The SP is running AIX 4.1.5 which will be upgraded to AIX 4.2 as soon as it is available for this system. The basic SP support software (Software environments: PSSP) is at version 2.2.
Current User software:
- PE (parallel environment, including POE), version 2.2.
- LoadLeveler, version 1.3 (new location for binaries is /var/loadl/bin.
- Fortran (xlf v 4.1)
- C (xlc v 3.1.4)
- C++ (xlC++ v 3.1.4)
- High Performance Fortran (xlhpf v. 1.1)
- Fortran 90 (xlhpf90 v.1.1)
- ESSL (v 2.2 - now combined with pessl)
- Parallel ESSL (v 1.2)
- MPI (v 2.2)
- PVMe (v 2.2)
- parallel gaussian 94 with Linda
- pdbx and xpdbx (debuggers)
- vt (visualization trace tool for performance analysis)
CHPC home directories are available via NFS. Due to the impact on the network traffic, it is requested that you copy needed executables and input files (in your LoadLeveler script) to local scratch space (800 MB/node) and do your execution and I/O locally.Then copy results back out at the end of the run. We will have sample scripts available soon. If you have immediate needs please contact the consulting center.
The same /usr/local tree is available to the SP as is on the RS6000 workstation cluster.
Interactive logins are restricted to the 8 older nodes. For convenience in access, these nodes have been given aliases, spni1-spni8 (for SP node, interactive n). Interactive parallel development is restricted to these 8 nodes. Parallel work on these 8 nodes should all be done in IP space.
The newer 56 nodes are designated as compute only nodes. Under POE, only batch jobs (submitted through LoadLeveler - see below) will be executed on these nodes.
LoadLeveler, IBM's batch queuing system has been initially configured with several classes. These classes are still under review and may change as users needs and system performance become better known.
- dev_8 open to all user, runs on the 8 slower nodes. This class has a cpu limit of 30 minutes per process, uses IP for message passing, and up to two processes per node. Any number from 1-8 nodes can be requested. Development work here can be done to demonstrate need for faster nodes.
- par_16 open to users of the batch node pool. Currently has no time limit but will be limited by wall clock, rather than cpu time. This will reward high parallelism in codes. These can be used in either IP or USER space for message passing on the switch. Only one process allowed per node. Any number of nodes from 1-16 can be requested. This class uses the 128 MB nodes, and, in order to have the potential of running up to 3, 16-node jobs at once, also uses the 256 MB nodes. When you request from this class you have no way of knowing in advance which nodes LoadLeveler will allocate to you.
- big_10 open to users of the batch node pool. Currently has no time limit but will be wall clock limited. Can be used in either IP or USER space. Only one process per node. Any number of nodes from 1-10 can be requested. Be aware that in order to use this class your memory limitation will actually be the 256 MB /node minimum.
- big_1g open to users of the batch node pool. Currently has no time limit but will be wall clock limited. Can be used in either IP or USER space. One process per node. 1-4 nodes can be requested. These are the 4 nodes with 1 GB of memory.
- big_512 open to users of the batch node pool. Currently has no time limit but will be wall clock limited. Can be used in either IP or USER space. One process per node. 1-2 nodes can be requested. These are the 2 nodes with 512 MB of memory.
- big_256 open to users of the batch node pool. Will be wall clock limited. IP or USER space. One process per node. 1-4 nodes can be requested. These are the 4 nodes with 256 MB of memory.
- par_32 open only by special arrangement to users of the batch node pool. It will be wall clock limited on a case-by-case basis. May use IP or USER space. One process per node. 32 nodes can be requested. Runs on nodes with 128 MB of memory.
- par_56 open only by special arrangement to users of the batch node pool. Will be wall clock limited on a case-by-case basis. IP or USER space. One process per node. All 56 newer, faster nodes.
- par_64 open only by special arrangement to users of the batch node pool. Will be wall clock limited on a case-by-case basis. Consideration for this class will require that the user address the issue of a 2X factor in cpu speeds with the older nodes. IP or USER space. One process per node. All 64 nodes.
- big_4 open only to special group of users. Will be wall clock limited. IP or USER space. One process per node. The 1 GB and 512 MB nodes. This class has a higher dispatch priority than the more open classes.
It is anticipated that the eight interactive nodes will be available to all members of the CHPC computing community. Access to the batch nodes will require permission, obtainable on demonstrable need. Allocations and access to the SP will be done according to the guidelines provided by the CHPC faculty advisory board. The users of the old SP2 will be added to the batch nodes originally. The interactive nodes are to be used for code development and testing only, not for production runs (either serial or parallel).
by John Storm
The Distributed Systems and Network Support Group (DSNSG), at the Center for High Performance Computing, has begun the formation of the new Advanced Networking Laboratory (ANL). By providing research and operational expertise along with evaluation facilities, the ANL will supply critical information needed in the design and operation of advanced network topologies on the University of Utah campus. Various hardware and software vendors such as Lucent Technologies (Bell Labs), Cisco Systems, Bay Networks, Siemons/Newbridge, FORE Systems, XYLAN, Cascade Communications, amongst others have all generously supplied equipment to the laboratory for evaluation. Currently ANL efforts have concentrated on Asynchronous Transfer Mode (ATM) technology and other emerging techniques such as Virtual Local Area Networks or VLANs. However, plans to investigate other emerging network technologies such as switched Gigabit Ethernet, Myrinet, Ether Channel, Fiber Channel, and others are also under way.
ATM requires a significant change in how we currently think about our data-networks on campus. Traditional router-based networks operate on packetized data, or datagrams, sent along to the destination via a route calculated on a hop-by-hop basis. Often referred to as datagram-oriented networks, without the use of some upper level protocol, such packet switched (or routed) networks can only guarantee a "best effort" level of service for data delivery. ATM however, is fundamentally different. Because it is a connection-oriented technology, akin to that of telephone service, end stations communicate across a virtual circuit existing only for the length of the call. While the virtual circuit exists, the network guarantees the end stations a certain negotiated Quality of Service (QoS). This allows for end stations to not only reserve network throughput, but also protect certain information types, such as video and voice, from potential time delays.
ATM also differs from the traditional network technologies in one other very important manner. The technology emphasizes that the solution to our overloaded networks is not just more bandwidth but adequate allocation of network resources. Other more traditional technologies make little provision for this, and instead rely on a theoretically unlimited resource. A network fully allocated versus a network fully saturated. The ANL will be evaluating technologies such ATM as a method to provide integrated services (voice, video, and data) to the desktop computer by the early part of the next century. Of course with ATM you also get more bandwidth...
by Byron Davis, Staff Scientist
The CHPC has established a timetable to end support of statistical software on the IBM Mainframe. The statistical software in question includes SPSS, SAS, and BMDP on the VM/CMS operating system, and SAS on the MVS operating system. Our present plan is to discontinue support for all statistical software running on the Mainframe around the end of the fiscal year. Specifically, SAS (both operating systems) and BMDP will cease to function on July 1, 1997, and SPSS on VM/CMS will cease to function August 1, 1997.
It has been over a year (December, 1995) since the Vice Presidents for Research and Administrative Services announced the end to their joint venture in the IBM Mainframe. Since that time, all management, charging, software installation and maintenance for academic/research computing on the IBM Mainframe has reverted to ACS (Administrative Computing Services), though CHPC continues to pay for the statistical software. It is our intention to complete the split (except for several important services) according to the timetable mentioned above. The services we are not ready to replace include Large data-set handling and storage, and foreign tape reading and writing with its associated operations requirements.
Use of the IBM Mainframe by campus researchers has declined significantly over the past few years. The steadily increasing power of desktop machines and accompanying campus-wide licenses for statistics software available on these desktop machines can and has answered the majority of statistical computing needs for campus researchers. For those researchers still needing high-end com- puting performance, UNIX machines, with their accompanying campus-wide licenses for statistical software, appear to be a much more cost effective solution.
The CHPC presently maintains one UNIX workstation specifically identified for statistical use. The software packages available on this machine include: SPSS, SAS, BMDP, STATIT, SUDAAN, LISREL and PRELIS, S-PLUS, and the BMDP (now SPSS) graphical program DIAMOND. The CHPC is committed to support the campus research community in the area of statistics, so should use of this workstation exceed reasonable limits, other machines and accompanying software will be committed for statistical users. If you wish to establish an account on our statistically oriented UNIX machine (named DV8), please contact our web site to find an application form. Our web site is: http://www.chpc.utah.edu/general/allocations.html.
At the time we are sending out this notice, about three months remain for users to make portable files of their data sets (those who have statistical package "system files") so that they can make their data sets readable on other equipment/platforms. If help is desired to accomplish this "portability" of data sets, please contact Dr. Byron Davis, ext: 5-5604, email: email@example.com to obtain assistance. Also, if the stated time-table for ending support of statistical software on the IBM Mainframe is unworkable for you, please send a description of your situation to Dr. Byron Davis, 78 SSB, or by email to the above stated address.
You may wish to consider keeping your Mainframe user ID so that you can have access to the tape handling and data storage services offered by ACS. If you do decide to have your Mainframe user ID purged, please make sure that any data of value is removed and/or backed up before you make a request to have your user ID purged. If you wish to remove your user ID on the Mainframe, call (581-5253), fax (585-5366), or send email to our administrative officer DeeAnn Raynor (firstname.lastname@example.org) and ask to have a "delete request form" sent/faxed (include your fax number) to you.