Using big computers to give insight into the structure, dynamics and interactions of biological macromolecules in their native

by Thomas Cheatham III, Asst. Research Professor, Dept. of Medicinal Chemistry, and Center for High Performance Computing

Biological macromolecules, such as proteins and nucleic acids (Figure 1), are the fundamental building blocks and machines for the processes of life. More than simply a component of your food, proteins serve as structural scaffolds, as transporters, and act as enzymes to facilitate chemical transformation. For example, the triple helix proteins of collagen serve as the principal components of bone, skin, cartilage and tendons. The protein hemoglobin serves as the principal oxygen carrier in our blood and a variety of other specific protein transporters control influx and efflux of a variety of molecules from cells. The cytochrome-P450 class of enzymes serves as bio-transformation engines, for example metabolizing a toxic pesticide residue ingested with our food into a form that is less toxic and can be more easily excreted. Nucleic acids, in addition to having structural and enzymatic roles, serve as the blueprint of our heredity and identity. With the massive efforts underway to decipher this genetic identity, such as the Human Genome Project, and the painstaking and detailed biochemical studies performed through the world, tremendous strides have been made in understanding the biological processes of life, and in particular the role of biological macromolecules.

Figure 1

Figure 1:Shown are the backbone of the protein myoglobin (left) [10] and the DNA duplex d[CGCGAATTCGCG]2 (right) [11]. Myoglobin, an oxygen carrier in mollusks related to hemoglobin, was the first protein structure to be solved by crystallography; this structure was solved by neutron diffraction. The DNA duplex was solved by high resolution NMR methods including residual dipolar coupling information.

Since the advent of crystallographic and nuclear magnetic resonance (NMR) techniques in the 60s and 70s, and even from earlier biophysical studies, it has been known that proteins and nucleic acids tend to adopt unique structures depending on their sequence of amino acids or nucleotides and also on their chemical environment. Given that each protein or nucleic acid has a distinct structure, this implies that the function of a biological macromolecule is directly related to its (unique) structure. However, structure is not the only determinant. In contrast to the static picture inferred from structural studies, biological macromolecules are flexible and undergo considerable motion and thermal fluctuation.Therefore, not only is structure important, but also the dynamics and the interaction with the surrounding chemical environment including bound ions, water and various other molecules. Probing the dynamics and interactions is difficult in experiment due to the large time and size scales involved and the finite and limited resolution of the experimental probes. Of the methods that can give atomic level resolution, such as various spectroscopic techniques, either interpreting the results is too difficult for macromolecules such as proteins and nucleic acids, or only an average picture (from a large sample over a long time) can be obtained. For example, although crystallographic methods can give atomic resolution or a picture at the angstrom (10 -10 meters) level, the structure that is obtained represents an average picture over millions of copies of the molecule in a crystal. Similarly, NMR methods probe the dynamics of many molecules in solution over millisecond time scales, time-scales that hide the individual motions of specific molecules. To get more detailed insight into the structure, dynamics, and interactions of proteins and nucleic acids over faster (sub-millisecond) time scales, theoretical techniques and simulation can give insight.

Bio-molecular simulation

Classical molecular dynamics simulations, which probe the motions of molecules according to Newton's equations of motion, are one such technology that can give detailed insight into the structure, dynamics and interactions of bio-molecules. The force on a given atom, which determines its motion, is obtained from the first derivative of a suitable energy function. This energy function effectively relates the energetic penalty of moving atoms away from the idealized geometries. Ultimately, the "true" energy representation for a molecule, such as that obtained from an accurate ab initio or density functional quantum mechanical treatment, may be applied. However, this is too costly for larger bio-molecules for anything other than single point energy evaluations of a particular structure. Therefore, to investigate proteins or nucleic acid in their surrounding bath of water and ions (Figure 2), empirically derived potential energy functions are applied. These are typically simple functions that relate the covalent connectivity of atoms to harmonic springs, rotation around bonds with a simple Fourier series and then consider interaction between all pairs of atoms via a simple charge and contact function that relates to the distance between atoms. Although these empirically derived potential energy functions, or force fields, cannot represent bond breaking, bond forming or electron transfer implicitly, they do a reasonable job of representing nucleic acid and protein structure, assuming a proper treatment of the electrostatic interactions and some inclusion of solvation effects.

Figure 2

Figure 2:Snapshot of the atomic positions from a simulation of d[CGCGAATTCGCG]2 (gray) in water (with the oxygen atoms displayed in blue) with net-neutralizing Na+ counterions (shown as gray dotted spheres). The simulation is run with periodic boundary conditions using a rhombic dodecahedral, 12-sided, periodic unit cell (with angles a=60°, b=90°, g=60°).

To perform a molecular dynamics simulation, an initial set of (random) velocities is assigned to each atom, the energy and force evaluated, and then the equations of motion are integrated to update the current positions. Then the cycle continues and gives a history in time of the dynamics and energetics. For accurate integration of the equations of motion, the time step for the integration needs to be small enough to accurately sample the fastest motions in the system. The fastest motions are the bond vibrations and imply a time step of 1-2 femptoseconds (10 -15 s). Since many biological processes, such as the folding of proteins, occur on a millisecond to second time scale, this implies that to represent these processes in MD simulation requires billions of energy and force evaluations; hence the need for supercomputers. The current state-of-the-art in simulations of proteins and nucleic acids (5,000-50,000 atoms) represents millions of energy and force evaluations in simulations on a 1-100 nanosecond time scale. The longest simulations to date, on a small protein in solution, are 1 ms simulations by Peter Kollman's group at UCSF [1]. These simulations used months of real time on 256 processors of Cray T3D and T3E machines. The sheer complexity suggests that in order to reach millisecond or second time scales in MD simulation will likely require 3-6 orders of magnitude increases in computer power. Even with new methodological improvements that will help traverse the time scale faster, such as multiple time step algorithms, not only do we desire longer simulation, but simulation of larger biological systems and complexes such as chromatin remodeling of DNA, the ribosomal machinery and ultimate entire cells. However, even without orders of magnitude increases, many interesting questions are accessible with current methods and technology (see later section).

Computational details:

A variety of codes for the simulation of biological macromolecules are available. Two of the most widely used are AMBER "Assisted Model Building with Energy Refinement" [2] and CHARMM "Chemistry at Harvard Molecular Mechanics" [3], both of which are applied in our research efforts. Other available codes include GROMOS/GROMACS, NAMD, NWCHEM and the commercial codes from Accelrys, Discover/Insight/Quanta, among others. Parallelized versions of AMBER and the molecular dynamics portion of NWChem are currently available for researchers at CHPC.

SGI O2000, R10K@194MHz (SGI mpi) raptor.chpc.utah.edu

nodes Time Speedup Efficiency
20 55.0 7.16 35.8%
16 48.2 8.17 51.1%
8 77.9 5.06 63.3%
6 81.2 4.85 80.8%
4 109.0 3.61 90.3%
2 213.7 1.84 92.0%
1 393.8 1.00

Compaq Sierra cluster, rocky.chpc.utah.edu

nodes Time Speedup Efficiency
32 16.66 7.08 22.1%
16 18.48 6.39 39.9%
8 24.18 4.88 61.0%
4 36.02 3.28 82.0%
2 64.92 1.82 91.0%
1 118.0 1.00

Single node performance on various
CHPC nodes

- 1.0 GHz AMD 227.1
- 1.7 GHz PentiumIV 181.2
- 1.2 GHz AMD 178.0
- 1.333 GHz AMD 165.2
- Compaq ES40 118.0

Table 1: Timings for the solvated protein dihydrofolate reducatase in water for 100 steps of molecular dynamics applying the particle mesh Ewald method. This represents 22930 atoms.

To give a little perspective regarding the parallel code development in AMBER [2] and the other codes, including my initial involvement in the ~1990 time frame, it is necessary to track back to the emergence and general availability of large-scale parallel machines. At this time it appeared that each machine had proprietary communication libraries (or not completely portable message passing standards such as machine specific PVM versions) or distinct programming styles (such as for the data-parallel machines from MasPar and Thinking Machines). This led to various distinct versions of AMBER being developed. For example, separate codes were available for machines ranging from clusters of workstations (with PVM, TCGMSG, MPI, or other message passing interfaces) to Cray T3X machines (using shmem() and one-way communicators), Thinking Machines CM-5, nCUBE, Maspar, Paragon, Convex, SGI, and a wide range of other machines with both shared memory (with Fortran directives) and message-passing parallel codes. These separate porting efforts and distinct codes led to a problem of code bifurcation into myriads of versions that became difficult to maintain, modify and otherwise keep track of. At the extreme in terms of obfuscation was the conversion of the core N-body interaction code to specially tailored hardware [4] or even complete re-write of the code in assembly language for specific machines in the CHARMM development [3] The multiple versions and machine-specific code complexities lead to difficulties in the development. Each of these codes is a large-team academic effort with development by numerous research groups across multiple universities. In addition to learning the science, new students and postdocs had to be aware of issues with parallelization and running on various different platforms. This inhibited development.

To overcome some of these difficulties, a trend to simplify the code took hold. The idea was to make the code more manageable and simpler to modify by limiting the number of possible algorithms for parallelization, removing obfuscating machine specific code, and by making a single general-purpose version of the codes. This included the standard use of MPI libraries for message passing communication and straightforward but communications intensive algorithms for parallelization (data replication). In AMBER this led to an abandonment of the shared memory parallel code and complete conversion to MPI. In CHARMM, this led to removal of the more efficient but uglier spatial decomposition and force decomposition algorithms. Although this came at the price of optimal parallel and single processor efficiency, this facilitated the evolution of both codes to support newer methods (such as the particle mesh Ewald method [5]). The general philosophy was to avoid machine specific optimizations and complex parallelization strategies that complicate the code where possible. This path was possible for a number of pragmatic reasons. First off, it was very difficult to get access to larger numbers of processors at the supercomputer centers due to high utilization of the machines. Secondly, the initial parallel machines tended to have fast communications and slow processors, such as on the Cray T3D and T3E machines. This allowed the use of communications intensive algorithms without a serious degradation in the run-time performance and parallel efficiency.

350 MHz Pentiums, 100 baseT (pgf77/mpich), icebox.chpc.utah.edu

nodes Time Speedup Efficiency
32 80.0 7.74 24.2%
24 88.7 6.98 29.1%
16 97.2 6.37 39.8%
8 138.7 4.46 55.5%
4 221.6 2.79 69.8%
2 366.7 1.69 84.5%
1 618.9 1.00

400 MHz Pentiums, 100 baseT (pgf77/mpich), icebox.chpc.utah.edu

nodes Time Speedup Efficiency
16 89.3 6.13 38.3%
8 124.7 4.39 54.9%
4 198.0 2.76 69.0%
2 325.2 1.68 84.0%
1 547.4 1.00

550 MHz Pentiums, 100 baseT (pgf77/mpich), icebox.chpc.utah.edu

nodes Time Speedup Efficiency
8 104.2 3.96 49.5%
4 163.3 2.53 63.3%
2 258.9 1.59 79.5%
1 547.4 1.00

Table 2: Comparison of timings on various Pentiums: Demonstration of the loss in parallel efficiency as the processor speed increases without increasing the speed of inter-processor communication.

These trends have changed. More and more, supercomputer centers want to be able to demonstrate efficient utilization of larger numbers of processors as rather large clusters become available. Complicating the desire to run on larger numbers of processors is that no longer do machines have high communications to processor speed ratios. This has made the community rethink the current parallelization and "general-purpose" code strategy. However, unlike the situation in the 80s and early 90s, fewer different machines are available now and it appears that two general classes of machines have emerged. The first is clusters of commodity processors with slow communication between processors. These Beowulf-class machines, running Linux, have become ubiquitous in computational chemistry laboratories and include the Icebox cluster here at the CHPC and the large LoBoS "lots of boxes on shelves" cluster I worked with previously at the NIH (http://www.lobos.nih.gov or http://biowulf.nih.gov). The second class of machines, as exemplified by the Compaq Sierra cluster at CHPC is tightly coupled clusters of SMP (symmetric multiprocessing) nodes. A problem is that no longer are our codes as efficient as we need them to be. This is particularly true of the Beowulf-class machines where our parallel efficiency is serious limited. Shown in Tables 1-4 are various timings on CHPC machines. Table 1 shows the relative performance of the Origin O2000 and Compaq Sierra cluster to various single process AMD Athlon and Pentium IV processors. The SGI system shows better scaling due to the higher communication to processor speed ratio than the Compaq. Considering the single processor performance numbers, although the alpha chip in the Compaq is still the leader, the considerably cheaper AMD athlon processors are catching up. In Table 2 is shown timings on various Pentium computers in icebox.chpc.utah.edu interconnected with a 100 base-T interconnect. As the processors get faster, the parallel efficiency drops. This is particularly notable with the faster AMD processors Table 3. To get around this, it is necessary to buy more costly interprocessor communication hardwares such as that available from Giganet, Inc. (Emulex), Myrinet, Inc. or Doplhin Scali. An alternative that adds performance at little cost is the use of software VIA (mvich + mvia) Table 4.

AMD Athlon, 950 MHz, 100 baseT (pgf77/mpich), icebox.chpc.utah.edu

nodes Time Speedup Efficiency
16 88.4 2.55 15.9%
8 95.5 2.36 29.5%
4 125.3 1.80 45.0%
2 166.8 1.35 67.5%
1 225.3 1.00

AMD Athlon, 950 MHz, VIA hardware (pgf77/mpich), icebox.chpc.utah.edu

nodes Time Speedup Efficiency
16 35.9 6.26 39.1%
8 46.9 4.79 59.9%
4 73.8 3.05 76.2%
2 125.8 1.79 89.5%
1 225.3 1.00

AMD Athlon, two dual 1.2 GHz SMP nodes with Giganet VIA hardware (pgf77/mpich), icebox.chpc.utah.edu

nodes Time Speedup Efficiency
4 63.62 2.78 69.5%
2 (giganet) 98.52 1.79 89.5%
2 (SMP) 105.78 1.67 83.5%
1 176.63 1.00

Table 3: Performance on AMD athlon processors with standard 100 baseT and also with faster interprocessor communication with the Giganet, Inc. VIA hardware.

The poor parallel performance on clusters of commodity processors has led the community to revisit the efficiency issues. Two new directions have emerged including the use of more complex, but parallel efficient algorithms and the development of hybrid shared memory (OpenMP) and message-passing parallel versions of the code. As an example of the former, Duan and Kollman applied a novel spatial decomposition algorithm in their 1 s protein folding simulations [1]. The idea of spatial decomposition is that for each atom, you only need the interaction forces between spatially close neighbors and therefore can limit the amount of communication performed. However, rather than perform moving around the coordinates of all the atoms, a modification by Duan and Kollman involved simply moving the indices for the solvent around such that they were spatially close; this reduces the amount of communication.Additionally, efforts are underway to apply algorithms such as multigrid [6] that are most parallel efficient (with linear scaling) at the cost of poorer single processor performance.

Successes and future promise:

The field of bio-molecular simulation has seen tremendous advance in recent years due to the ready availability of faster computers and parallelization of the codes, efficient algorithms for treating the electrostatic interactions, and the development of better empirical force fields. Progress has been most notable in the simulation of poly-ionic systems such as nucleic acids [7], the application of simulation as an aid in the prediction of protein structure [8], and in simulation of the properties of lipid bilayers.

AMD Athlon, 700 MHz

nodes Time Speedup Efficiency
8 108.86 2.65 33.1%
4 152.82 1.89 47.3%
2 203.20 1.42 71.0%
1 288.15 1.00

AMD Athlon, 700 MHz with software VIA

nodes Time Speedup Efficiency
8 91.0 3.19 39.9%
4 128.25 2.67 66.8%
2 183.31 1.59 79.5%
1 288.15 1.00

Table 4: Comparison of timings with software VIA enabled on 700 MHz nodes.

Focusing on nucleic acids, the advances include accurate representation of the sequence specific fine structure and dynamics of nucleic acids, insight into the role of specific water and ion association on the structure and dynamics, and even limited folding of nucleic acid structures. Simulations of this type give detailed atomic level insight into the sequence-specific and environmental influences on DNA bending and compaction (an important process for DNA regulation, packing of ~1 meter lengths of DNA in our cells, and gene therapy). Moreover, thanks to advances in methods to judge the reliability of our simulation results [9], these technologies are proving useful in the design of drugs, such as molecules that bind sequence specifically to DNA and can block its transcription. The MD simulations we apply give detailed insight into the fast-time scale dynamics and interactions. This includes a detailed picture of the influence of specific water and ion association on DNA structure. As an example of the type of information we can obtain, shown in Figure 3 is an average structure of DNA shown as a space-filling model (in gray) from a 10 nanosecond length MD simulation (in explicit water) with average water density (blue) and sodium ion density (yellow) superimposed. As computers get faster and the methods continue to advance, the future holds considerable promise for these methods in protein and nucleic acid structure prediction, drug design and detailed insight into fast time scale dynamics and interactions of these molecules. For more information about the research in the Cheatham lab, see http://www.chpc.utah.edu/~cheatham

Figure 3

Figure 3: An average structure of a DNA duplex from a 10 ns molecular dynamics simulation. Superimposed is average water (blue) and ion (yellow) density.

References:

  1. Duan, Y.; Kollman, P.A. "Pathways to a protein folding intermediate observed in a 1-microsecond simulation in aqueous solution" Science, 1998, 282, 740-744.
  2. Pearlman, D.A., et al. "AMBER, a package of computer programs for applying molecular mechanics, normal mode analysis, molecular dynamics and free energy calculations to simulate the structure and energetic properties of molecules" Comp. Phys. Comm., 1995, 91(1-3), 1-41.
  3. Brooks, B.R.; Bruccoleri, R.E.; Olafson, B.D.; States, D., J; Swaminathan, S.; Karplus, M. "CHARMm: A program for macromolecular energy, minimization, and dynamics calculations." J. Comp. Chem., 1983, 4, 187-217.
  4. Komeiji, Y.; Uebayasi, M.; Takata, R.; Shimizu, A.; Itsukashi, K.; Taiji, M. "Fast and accurate molecular dynamics simulation of a protein using a special-purpose computer" J. Comp. Chem., 1997, 18, 1546-1563.
  5. Essmann, U.;Perera, L.; Berkowitz, M.L.; Darden, T.; Lee, H.; Pedersen, L.G. "A Smooth Particle Mesh Ewald Method" J. Chem. Phys., 1995, 103(19), 8577-8593.
  6. Sagui, C.; Darden, T.A. "Multigrid methods for classical molecular dynamics simulations of biomolecules" J. Chem. Phys., 2001, 114, 6578-6591.
  7. Cheatham, T.E., III; Kollman, P.A. "Molecular dynamics simulation of nucleic acids" Ann. Rev. Phys. Chem., 2000, 51, 435-471.
  8. Lee, M.R.; Baker, D.; Kollman, P.A. "2.1 and 1.8 angstrom average C-alpha RMSd structure predictions on two small proteins, HP-36 and S15" J. Amer. Chem. Soc., 2001, 123, 1040-1046.
  9. Kollman, P.A., et al, Cheatham, T.E. III. "Calculating structures and free energies of complex molecules: Combining molecular mechanics and continuum models" Acc. Chem. Res., 2000, 33, 889-897.
  10. Phillips, S.E.V. "Structure and refinement of oxymyoglobin at 1.6 angstroms resolution" J. Mol. Biol., 1980, 142, 531.
  11. Tjandra, N.; Tate, S.; Ono, A.; Kainosho, M.; Bax, A. "The NMR structure of a DNA dodecamer in an aqueous dilute liquid crystalline phase" J. Amer. Chem. Soc., 2000, 122, 6190-6200.

Mathematical libraries at CHPC

by Martin Cuma, Scientific Applications Programmer, Center for High Performance Computing

Introduction:

The heart of scientific computer applications often consists of advanced mathematics operations such as various linear algebra algorithms, Fourier or Laplace transformations, differential equation solvers, etc. To achieve optimal performance, there is a need for highly optimized set of mathematical routines for a specific computer platform, which are generally developed by the computer manufacturers to take advantage of the specific computer and processor architecture. Along with vendor supplied mathematical libraries, there is also a wide range of freeware and commercial mathematical library packages. The purpose of this article is to introduce those that are installed on the CHPC machines to the users and give pointers for their further utilization.

CHPC currently actively maintains four different computer platforms from different vendors. These include Silicon Graphics IRIX based systems (SGI Origin 2000 - raptor, SGI Powerchallenge - inca/maya), IBM-SP2 running AIX (sp), Compaq Alphaserver cluster running TRU64 unix (rocky/sierra) and Linux PC cluster (icebox). For the first three platforms, the vendors provide optimized set of linear algebra routines, equivalent to open source BLAS (Basic Linear Algebra Subroutines), and also various other statistical and signal processing algorithms. Some of these libraries also provide certain parallel processing support. Due to the open source setup of the Icebox cluster, the situation there is slightly more complicated and will be discussed in more detail below.

Based on the vendor supplied routines, we have built several open source packages with extended mathematical capabilities, including linear algebra library LAPACK and its parallel version, ScaLAPACK . These packages include linear algebraic equations and eigenvalue problem solvers. CHPC also maintains another parallel linear algebra and differential equation solver package, PETSc, which is somewhat more user-friendly than ScaLAPACK. Icebox has also installed Fast Fourier Transform library FFTW. Finally, we also have a full license to Numerical Algorithms Group (NAG) Release 6 library, which we are in process of installing on the CHPC platforms.

Basic Mathematical Libraries

In this part of the article, we will give some more details on the vendor based mathematical libraries including basic instructions for use. All these libraries are based on the BLAS standard, which consists of three levels. BLAS 1 routines involve vector-vector operations, BLAS 2 includes vector-matrix operations and BLAS 3 matrix-matrix operations. Since these libraries differ from platform to platform, we will discuss them separately. More detailed descriptions and instructions can be found on our website at: http://www.chpc.utah.edu/software/docs/mat_l.html

SGI Origin 2000 and Powerchallenge

SGI/Cray Scientific Library (SCSL) includes Level 1,2 and 3 BLAS and Fast Fourier transform subroutines optimized for the SGI systems. It is installed as a part of the operation system and resides in the default system area, libraries at /usr/lib and header files at /usr/include. Compilation is very simple, just include -lscs flag in the compilation line of your Fortran or C code. Shared memory parallel library is linked with flag -lscs_mp. The aforementioned CHPC website contains several examples and references to further documentation. The SCSL library includes shared memory parallelization support.

IBM SP

IBM supplies ESSL, Engineering and Scientific Subroutine Library, and its parallel version PESSL. Both libraries include all three level BLAS and LAPACK equivalent routines, signal analysis, sorting, searching, interpolation and random number generators. Paths to these libraries are also included in the system configuration, so only link directives need to be supplied for linking with Fotran or C codes. In case of ESSL, this is -lessl, for PESSL, use -lpessl -lblacs. BLACS stands for Basic Linear Algebra Communication Subprograms, which is also used in open-source ScaLAPACK package, and enables distributed memory parallel processing. Example source code and links to the documentation can be again found on the CHPC's Mathematical Software webpage.

Compaq Alphaserver

Compaq supplies CXML, Compaq Extended Math Library, which includes BLAS 1, 2 and 3 and LAPACK routines, sparse linear system solvers and signal processing routines. There are two versions, a serial release links with -lcxml, and parallel version for shared memory multiprocessing links as -lcxmlp. As with the other platforms, the libraries are located in /usr/lib, so no additional compilation flags are required. Further reference can be found on the webpage.

Icebox

There are various versions of the BLAS library available for the Intel based Linux systems, however, many of them don't take full advantage of the latest processor optimization possibilities. Examples of these would include BLAS supplied with RedHat distributions and with Portland Group compilers. Both of these are optimized for general i386 instruction set. CHPC has installed two higher optimized solutions. One of them is ASCII Red BLAS library, developed at Intel and optimized for Pentium II processor. The other is ATLAS, Automatically Tuned Linear Algebra Software, aimed at providing portable performance solutions and developed at University of Tennessee, Knoxville and Oak Ridge NL. ASCII Red BLAS is available only in object form and is located at /uufs/icebox/sys/pkg/ASCIRedBLAS. The ATLAS library is located at /uufs/icebox/sys/pkg/atlas and has been compiled in two versions. The first, with optimizations for Pentium II chips, runs on both Pentium II and Athlon processors, and a second with specific optimizations for Athlon chips. Our tests show, that some BLAS routines perform faster by the ASCII Red BLAS, while others by ATLAS. The performance of each library is also affected by the problem size. We would suggest that the users benchmark their codes against both libraries, but encourage to use ATLAS, since it is an on-going open source project that allows for more flexibility with hardware changes in the future. Details on how to compile Fortran and C source files are given our Mathematical Libraries documentation webpage.

Extended mathematical libraries

Extended mathematical libraries provide computational capabilities above the basic vendor supplied or BLAS libraries. They are installed on majority of the CHPC platforms and can be added to the other ones upon users requests. Documentation details with compilation instructions are given on our webpage. Below is a short summary of the packages that CHPC maintains.

LAPACK

Linear Algebra Package (LAPACK) contains routines for solving systems of simultaneous linear equations, eigenvalue problems, and singular value problems for dense and banded matrices. LAPACK is installed on the SP, Raptor and Icebox.

ScaLAPCAK

ScaLAPACK is a parallel version of LAPACK for distributed memory computers. The communication between the processes is maintained by BLACS package, and parallel BLAS, or PBLAS, is used for the basic linear algebra routines. ScaLAPACK is installed on the SP, Raptor and Icebox.

PETSc

PETSc (Portable, Extensible Toolkit for Scientific Computation) is a package incorporating parallel data structures and routines to numerically solve partial differential equations and related linear and non-linear problems. PETSc is installed on the SP, Raptor and Icebox.

FFTW

Fastest Fourier Transform in the West (FFTW) is a freeware subroutine library for computing the Discrete Fourier Transform (DFT) in one or more dimensions. It provides portable performance for different computing platforms. FFTW is installed on Icebox.

NAG

NAG library is an extensive project encompassing most of computational mathematics, and is the only commercial library available at CHPC. Apart from advanced routines, such as ordinary and partial differential equation solvers etc., it also containts full BLAS and LAPACK support. The library has separate Fortran and C components and is available for most high performance computing platforms. The main disadvantage is that the names of routines are specific, so there can be problems with transfering the code using NAG to a different machine that does not have the NAG licence.

ECCE: A New Computational Chemistry Tool Available at CHPC

by Anita Orendt, Molecular Sciences, Center for High Performance Computing

Ecce - Extensible Computational Chemistry Environment - is a problem-solving environment for computational chemistry developed at PNNL that provides a graphical user interface, scientific visualization tools, and the underlying data management framework to allow users to create the input file, submit the job, and follow the job to its completion, all from a desktop workstation or PC. In addition there are tools for the organized storage and analysis of the data produced by the computational jobs. Currently the interface supports Gaussian-98 and NWChem jobs. Ecce is composed of an integrated suite of distributed client/server UNIX based X Window System applications, and the software architecture is based on an object-oriented chemistry data model to support the management of computational and experimental molecular data. For more detailed information on ecce see the website at http://www.emsl.pnl.gov:2080/docs/ecce.

At CHPC ecce is currently installed on corona.chpc.utah.edu, a sun workstation, and has been configured to submit batch jobs to sierra, icebox, raptor, and inca. To run ecce you must have an account on the HPC machines on which you plan to run and an account on corona (send a request to issues@chpc.utah.edu). In addition, you must be a registered ecce user (contact Anita Orendt in person at 412 INSCC or by phone at 7-9434). You will need access via a X-server program that supports OpenGL (Sun workstations with Creator3D graphics hardware, SGI workstations with OpenInventor software, PCs with Exceed 3D). The Exceed can be obtained through the Office of Software Licenses; see http://www.osl.utah.edu. Note you will need the current version of Exceed as well as the Exceed 3D module. Address questions about the use of ecce at CHPC to Anita Orendt, orendt@chpc.utah.edu; do not contact the ecce development team at PNNL directly.

A Typical Ecce Session

Log into corona, make sure your display is properly set and that you are in a csh, then type:

source /uufs/corona/sys/pkg/ecce/2.0/scripts/runtime_setup

ecce

You will be prompted for a passphrase. This phrase is a four to eight character long password that can be anything you want it to be. Set this phrase carefully, as it protects your machine configurations/preferences/passwords and your database files. If you forget it you can just hit new but this will cause the loss of this stored data. The first time you access ecce (or if you change your passphrase) you will be prompted for your ecce username and password that you were given when you registered as an ecce user.

This should open the Ecce Gateway. The Gateway has five modules: Calculation Manager, Builder, Basis Set Tool, Machine Browser, and Periodic Table. The Basis Set Tool contains information about pre-defined basis sets and tools to create new ones, while the Periodic Table contains elemental reference information. There is also a place to set preferences and a help utility that starts Netscape. When the help is activated from within the individual modules the on-line documentation on the module is displayed. The help documentation includes step-by-step examples on using the various features of ecce. The Ecce/EMSL window gives a list of the open windows, as well as being an indicator that Ecce is working on something, as the blue wave logo will be in motion.

The Calculation Manager interface has been styled to function like a Microsoft Windows Explorer interface. Notice a user can define a calculation as part of a project, giving the user the ability to organize their jobs. The easiest place to start a new ecce session is by opening the Calculation Manager and stetting up a new project and/or calculation.

With a calculation chosen from the Tools menu you can select the Calculation Editor; there are two different editors, one for Gaussian-98 and a second for NWChem. From within the Calculation Editor, the first step is to define the Chemical System. This step will open the ECCE Builder. The features of the Builder include predefined valence types of each atom and a structure library of pre-build molecules. The builder can also about define the symmetry (Symmetry found under Toolkits) of your system and make the coordinates of the proper accuracy to reflect the symmetry in your job.

With a chemical system defined, the remaining information about the input file needs to be specified in the Calculation Editor. The most common theory, run types, basis sets, etc can be chosen by a simple selection; for more complex options the user can use the Final Edit option which opens the input file in a vi editor window.

The Launch button opens a window that shows you the machines available to run the job you have now created. For Gaussian this list will be icebox, inca, raptor, rocky, and sp (though the sp is not yet working), whereas for NWChem the choices are icebox, raptor, and rocky. The first time Ecce is used, the user will need to configure their access to the individual machines by providing their username, password, paths to their working directory and their scratch directory. This interface is reached from the Job pulldown menu. Select the sshutah option for the remote shell. Individual users can also add other machines to which they have access. While the program knows about the queue restrictions on the different systems, it does not know details such as setting a different QOS or choosing specific nodes on icebox by memory or speed. In order to do this, the job can be Staged (under Job pulldown menu), the script edited on icebox, and then the job can be launched using Finish Stage Launch. The Machine Browser on the Gateway can be used (use Process Status not Machine Status) to aid in your decision about which compute platform to use.

Once a job is launched and running, it can be monitored through the Calculation Viewer, available under the Tools pulldown menu in the Calculation Manager. The status of a SCF cycle or a geometry optimization can be monitored during a run. In addition, properties can be visualized using this module after the job is complete.

Last Modified: October 06, 2008 @ 21:09:15