MKL contains highly optimized math routines. It includes full optimized BLAS, LAPACK, sparse solvers, vector math library, random number generators and and fast Fourier transform routines (including FFTW wrappers). For more information, consult the Intel Math Kernel Library Documentation. MKL is bundled with the Intel compiler suite, so, in order to use it, one needs to load the intel module:
module load intel
Intel Fortran (using dynamic linking )
ifort lapack1.f90 -o lapack1_ifort -L$MKLROOT/lib/intel64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -Wl,-rpath=$MKLROOT/lib/intel64
Intel C/C++ (using dynamic linking)
icc lapack1.c -o lapack1_icc -L$MKLROOT/lib/intel64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -Wl,-rpath=$MKLROOT/lib/intel64
If you use the C++ compiler, please replace icc by icpc and change the suffix .c into .cc in the previous statement.
It is also possible to incorporate OpenMP-threaded MKL into an OpenMP or mixed MPI/OpenMP code. To do so, parallelize your code with OpenMP but leave the MKL calls unthreaded, and instead link the threaded MKL library as e.g.:
icc lapack1.c -o lapack1_icc_mt -L$MKLROOT/lib/intel64 -Wl,-rpath=$MKLROOT/lib/intel64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lpthread
Then run as you usually would with given OMP_NUM_THREADS and MKL calls will run over that many threads as well.
For distributed (MPI) parallel linear algebra routines, ScaLAPACK is also fully implemented inside MKL and recommended to use instead of the reference ScaLAPACK distribution. From release 11.2 (2015), MKL also includes cluster sparse matrix solvers based on PARDISO.
These and other advanced MKL routines require relatively complex linking schemes for
which the best is to use the MKL Link Line Advisor page. The MKL Link Advisor also lets you define link flags for GNU and PGI compilers,
which we recommend to use as MKL generally provides superior performance. To use GNU
or PGI compilers with MKL, first load the intel module, then load the GNU or PGI module,
and then other potential libraries to use with GNU or PGI compiler.
MKL also includes interface for FFTW - commonly used Fast Fourier Transform library.
It is advantageous to use this interface especially when building multi CPU architecture
binaries with the -ax Intel compiler flag. The header files for the FFTW interface
GSL is a numerical library for C/C++ provides a wide range of mathematical routines such as random number generators, special functions and least-squares fitting. There are over 1000 functions in total with an extensive test suite. While GSL is not parallel, it is reasonably thread safe and its routines should be callable from parallel code sections. One can also link a parallel BLAS library such as MKL or ACML and utilize the shared memory parallelism they provide.
GSL has been built with the GNU compilers and is accessible via a module:
module load gsl
module load gcc gsl
gcc source.c -o executable -I$GSL_INCDIR -L$GSL_
LIBDIR-lgsl -lcblas -Wl,-rpath=
This links with the generic unoptimized version of BLAS. $GSL_INCDIR and $GSL_LIBDIR are environment variables defined in the gsl module.
module load intel gsl(or
source.c -O3 -axCORE-AVX2,AVX,SSE4.2 -o executable -I$GSL_INCDIR -L
$GSL_LIBDIR-lgsl -L$MKLROOT/lib/intel64 -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -lpthread
This links with MKL threaded BLAS library for optimal performance and OpenMP parallelism.
Automatically Tuned Linear Algebra Software (ATLAS) is an open source library aimed at providing portable performance solution. It provides
full BLAS and certain LAPACK routines, which are being tuned to the computer platform
at the compilation time. Since we provide vendor optimized BLAS such as the Intel
MKL, we are deprecating ATLAS support. Relatively old versions of the library are
OpenBLAS is is an optimized BLAS library based on GotoBLAS2. Its advantage is a relative simplicity,
disadvantage is a low maturity. Some of the applications we build link to OpenBLAS
for simplicity, but we recommend that everyone uses MKL instead. It is located in
/uufs/chpc.utah.edu/sys/installdir/openblas. Linking is relatively simple with adding the following to the link line:
LAPACK (Linear Algebra PACKage) provides routines for solving systems of simultaneous linear equations, least-squares solutions of linear systems of equations, eigenvalue problems, and singular value problems. It runs on single processor only. The CentOS 7 operation system comes with reference LAPACK (and BLAS), but we highly recommend to use the Intel MKL which includes full LAPACK for optimal performance. Linking LAPACK with MKL is the same as linking BLAS, described above.
The ScaLAPACK (or Scalable LAPACK) library includes a subset of LAPACK routines redesigned for distributed memory MIMD parallel computers. It is written in a Single-Program- Multiple-Data style using explicit message passing for interprocessor communication. It assumes matrices are laid out in a two-dimensional block cyclic decomposition.The fundamental building blocks of the ScaLAPACK library are distributed memory versions (PBLAS) of the Level 1, 2 and 3 BLAS, and a set of Basic Linear Algebra Communication Subprograms (BLACS) for communication tasks that arise frequently in parallel linear algebra computations. In the ScaLAPACK routines, all interprocessor communication occurs within the PBLAS and the BLACS. One of the design goals of ScaLAPACK was to have the ScaLAPACK routines resemble their LAPACK equivalents as much as possible.
module load intel impi
mpiifort -openmp -o executable program.f90 -Wl,-rpath=$MKLROOT/lib/intel64 -L$MKLROOT/lib/intel64 -lmkl_scalapack_ilp64 -lmkl_intel_ilp64 -lmkl_core -lmkl_intel_thread -lmkl_blacs_intelmpi_ilp64 -liomp5 -lpthread -lm -I$MKLROOT/lib/include
Fastest Fourier Transform in the West (FFTW) is a high performance Fast Fourier Transform (FFT) library. Apart from being optimized
for most PC architectures it also includes OpenMP and MPI parallelism. Latest serial
and threaded OpenMP builds with the three compilers that we support (GNU, Intel and
PGI) can be accessed through their respective modules. To link serial FFTW with e.g.
Intel compiler, simply add
to the link line. To link OpenMP FFTW, add
to the serial link line.
For example, for the PGI compiler with OpenMP:
module load pgi fftw
pgcc myprog.c -o myprog.exe -I$FFTW_INCDIR -L$FFTW_LIBDIR -Wl,-rpath=$FFTW_LIBDIR -lfftw3 -lfftw3_omp
Please, note that there is also FFTW version 2 which is still used in some of the
codes, which is incompatible with FFTW 3. We have it installed at
Also note that the Intel MKL includes FFTW wrappers with the FFT performance being on par with FFTW, for the information how to link see our MKL documentation.