Logging in to the clustersCHPC has two main clusters as of March 2015: ember and kingspeak. For the purposes of this tutorial, we will assume you are using ember (but either cluster will do). Log in using an ssh client of your choice:
[user@wherever:~]$ ssh email@example.com
Make sure to replace the username with your own UNID, and if you want a different cluster, replace it with the appropriate cluster name. When you set up your account with CHPC, you selected a default shell, either bash or tcsh. If you forgot which shell you selected, you can find out using the SHELL variable:
[u0123456@ember1:~]$ echo $SHELL
This will give something like /bin/bash or /bin/tcsh. The syntax for scripting each of these shells is different, so make sure you know which one you are using! There are also many good resources on the internet for learning shell scripting. Associated with each of these shells is a configuration file, called .tcshrc and .bashrc. CHPC has specific configuration files that are essential for setting up cluster specific environments that are set up when user account is created. In March 2015, CHPC started using modules for environment setup. User accounts set up before this time do not use modules. In order to proceed with this tutorial, set up your environment for modules as described in our modules help page.
Note: the rest of this tutorial assumes you are using bash for your shell.
To get started, execute the following:
[u0123456@ember1:~]$ module load intel mpich2
This command loads the environment for the Intel compilers and MPI distribution MPICH (previously called MPICH2). Consult the CHPC MPI libraries help page for performance recommendations, or experiment with your own codes to see what MPI setups provide the best performance. Once you've sourced MPI, you should now be able to execute mpicc:
[u0123456@ember1:~]$ mpicc -v mpicc for MPICH version 3.1.2 icc version 15.0.1 (gcc version 4.4.7 compatibility)
If you have your own source code to test, you may want to use that, but in case you don't, here is a simple hello world script:
You can download it using wget, or you can copy and paste it into your favorite editor. Once you have the file, you can compile it using:
[u0123456@ember1:~]$ mpicc hello_1.c -o hello_ember
If you received any warnings, ignore them. If you have an error, you probably copied the program incorrectly. Important note: It's good practice to compile programs on the interactive nodes of the cluster you'll be working with, and distinguish them using different names (e.g., hello_ember, hello_kingspeak). Generic builds may suffer from lower performance than builds specific to a particular cluster, primarily due to different hardware configurations. Again, visit the cluster user guides for more information on best practices.
Interactive job submission
Now that you have your executable, you're ready to execute the job on the cluster. As of April 2015, CHPC is using the Slurm scheduler. FOr details on its use see the Slurm help page. There are two ways to submit a job, either through an interactive session or through a batch script. For initial program testing, it is more efficient to use an interactive session. Interactive sessions are also appropriate for doing analysis with programs with GUIs or long compile sessions on the cluster, where running on the standard interactive nodes would be inappropriate.
IMPORTANT WARNING: Never execute a large MPI job on the main interactive nodes (the ones you log in to initially). These nodes are shared by all CHPC users for basic work, and heavy load tasks will degrade performance for everyone. Tasks that exceed 15 minutes under heavy load will be arbitrarily terminated by CHPC systems.
Begin by submitting a request to launch a job on to the cluster nodes. Depending on cluster loads, you may or may not have to wait for the job to start. It will be easier to start an interactive session on the least utilized cluster - check system status using the sinfo command and look for idle nodes. The command:
[u0123456@ember1:~]$ srun -t 0:05:00 -n 24 -N 2 -A chpc -p ember --pty /bin/tcsh -l
This will request an interactive session on ember (--pty /bin/tcsh -l), with 2 nodes (-N 2), 24 tasks (processors) total (-n 24), for 5 minutes (-t 0:05:00). Unless your MPI code is threaded, you should ask for as many task as physical cores available in order to efficiently utilize the resources. See the cluster user guides for details on how many CPU core counts cluster nodes have.
Once the interactive session starts, running the job is quite simple. Navigate to the directory where your program is stored and execute the following commands:
[u0123456@em123:~]$ module load intel mpich2
You may need to put in your password once or twice to allow connection to the nodes and confirm some RSA keys. Once you get the command prompt back, execute this command:
[u0123456@em123:~]$ mpirun -np 24 $HOME/hello_ember
Make sure to change the path for your hello world program if you put it somewhere besides your home directory (e.g., $HOME/test/hello_ember). Also change the -n flag to reflect the number of processors you will be running on. If everything goes well, you should see something like this:
Hello world Hello world Hello world Hello world Hello world Hello world Hello world Hello world [u0123456@em123:~]
You should have the same number of "Hello world" lines as you have processors. Finally, exit the interactive session by using the command exit.
Batch job submission
With your favorite editor, make a new file and call it testjob. Copy and paste the
following simple script into the file:
#! /bin/bash #SBATCH -t 0:02:00
#SBATCH -n 24
#SBATCH -N 2
#SBATCH -A account-name
#SBATCH -p ember #SBATCH --firstname.lastname@example.org cd $HOME module load intel mpich2 mpirun -np $SLURM_NTASKS $HOME/hello_ember > test.out
All of the #SBATCH comments are directives for job control, just like the ones used in the interactive srun command. If you're using a different cluster (kingspeak), make sure to use the task count that corresponds to the total core count on the nodes requested and the -np flag, as well as the email that the script points to. To execute the script on the cluster:
[u0123456@ember1:~]$ sbatch testjob
The output upon successful submission will give the job number and an internal moniker for the job. In order to view the job in the queue, you can use the following commands:
squeue - shows all jobs in the queue and current metrics squeue -u u0123456 - shows all jobs for UNID u0123456 (use your own!) squeue --start --jobid=112233 - gives an estimate for when a job will start scontrol show job 112233 - gives useful information about a job
Note that many of these commands may not be useful for this job if it begins running right away. For longer jobs you may run in the future, these will become very useful. If your job ran without error, you should have three output files:
[u0123456@em123:~]$ ls test.o123456 hello_ember test.e123456 hello_1.c test.out
These output files have the format name.(o/e)number and correspond to the standard output and error produced by linux programs. If you use cat on test.out, you'll see output like we saw earlier in the interactive session. If the program ran with an error, or something gets written to output by the batch script, then those will appear in the numbered output files. If you have problems with your program during a batch session, you should look there.
[u0123456@em123:~]$ cat test.out Hello world Hello world Hello world Hello world Hello world Hello world Hello world Hello world
This concludes the tutorial. If you have trouble with this tutorial or anything else on CHPC systems, contact email@example.com. You may also want to consider attending the presentations which are held each semester by CHPC staff members, spanning a variety of topics such as Linux Basics, Parallel Programming, and Systems Overviews.