You are here:

I can't ssh to machine anymore, getting a serious looking error that starts with


The full error looks like this:

Someone could be eavesdropping on you right now (man-in-the-middle
It is also possible that a host key has just been changed.
The fingerprint for the RSA key sent by the remote host is
Please contact your system administrator.
Add correct host key in /your_home/.ssh/known_hosts to get rid of
this message.
Offending RSA key in /your_home/.ssh/known_hosts:21
RSA host key for has changed and you have requested
strict checking.
Host key verification failed.

While looking scary, this error is usually benign. It occurs when the SSH keys on the machine you are trying to connect to change, most commonly with operation system upgrade. There are two ways to get rid of this message and log in:

  1. open file ~/.ssh/known_hosts in a text editor and delete the lines that contain the host name you are connecting to                                      
  2. use ssh-keygen  command with -R flag to remove the ssh keys for the given host
    e.g. ssh-keygen -R

On the subsequent ssh connection to the machine says something like:

The authenticity of host ' (' can't be established. 
RSA key fingerprint is cd:9b:4c:f3:88:4d:6a:e2:34:4b:96:83:37:04:ed:9e.
Are you sure you want to continue connecting (yes/no)?

Answer yes to that and you will be prompted for a password and can log in.

 My calculations, or other file operations complain that the file can't be accessed, or it does not exist, even though I have just created or modified it.

This error may have many incarnations but it may look something like this:

ERROR on proc 0: Cannot open input script in.npt-218K-continue (../lammps.cpp:327)

It also occurs randomly, sometimes the program works, sometimes not.

This error is most likely due to the way how the file system writes files. For performance reasons, it writes parts of the file into a memory buffer, which gets periodically written to the disk. If another machine tries to access the file before the machine that writes the file writes it to the disk, this error occurs. For NFS, which we use for all our home directories and group spaces, it is well described here. There are several ways to deal with this:

  1. Use the Linux sync command to forcefully flush the buffers to the disk. Do this both at the machine where the file writing and file reading occurs BEFORE the file is accessed. To ensure that all compute nodes in the job sync, do "srun -n $SLURM_NNODES --ntasks-per-node=1 sync ".
  2. Sometimes adding the Linux sleep command can help, to provide extra time window for the syncing to occur.
  3. Inside of the code, use fflush for C/C++ or flush for Fortran. For other languages, such as Python and Matlab, google them for "flush" to see what options are there.

If neither of these help, please, try other file system to see if the error persists (e.g. /scratch/global/lustre, or /scratch/local), and let us know.

Starting Emacs editor is very slow.

Emacs's initialization includes accessing many files, which can be slow in the network file system environment. The workaround is to run EMacs in the server mode (as a daemon), and start each terminal session using emacsclient command. The Emacs daemon stays in the background even if one disconnects from that particular system, so, it needs to be started only once per system start.

The easiest way is to create an alias for the emacs command as

alias emacs emacsclient -a \"\"

Note the escaped double quote characters (\"). This will start the emacs as a daemon if it's not started already, and proceeds to run in the client mode.

Note that by default emacsclient starts in the terminal, to force to start Emacs GUI, add "-c" flag, e.g. (assuming the aforementioned alias is in place) "emacs -c myfile.txt".


Opening Emacs file is very slow

We have yet to find the root of this problem but it's most likely caused by the number of files in a directory and the type of the file that Emacs is filtering through. The workaround is to read the file without any contents conversion, M-x find-file-literally <Enter> filename <Enter>. After opening the file, one can tell Emacs to encode the file accordingly, e.g. to syntax highlight shell scripts, M-x sh-mode <Enter>.

To make this change permanent, the ~/.emacs file to add:

(global-set-key "\C-c\C-f" 'find-file-literally)

Troubleshooting Slurm jobs that won't start (errors and other reasons)

  • Batch job submission failed: Invalid account or account/partition combination specified

    This error usually indicates that one is trying to run a job in general partition, but the research group does not have an allocation or has used all of their allocation for the current quarter. To view current allocation status, see this page. If your group is either not listed in the first table on this page or there is a 0 in the first column (allocation amount) your group does not have a current allocation. In this case, your group may want to consider completing an allocation request.

    Jobs without allocation must run in the freecycle partion. They will have lower priority and will be preemptable. There are examples of account–partition pairs on the Slurm documentation page. Alternatives include using unallocated clusters (ember, lonepeak, and tangent) or running on owner nodes as a guest (with the possibility of preemption).

    This error can also be caused by an invalid combination of values for account and partition: not all accounts work on all partitions. Check the spelling in your batch script or interactive command and be sure you have access to the account and partition. To view the combinations you can use, use the sacctmgr command; more information (including example commands) can be found on the Slurm documentation page.

  • Batch job submission failed: Node count specification invalid

    The number of nodes that can be used for a single job is limited; attempting to submit a job that uses more will result in the above error. This limit is approximately one-half the total number of general nodes on each cluster (currently 8 on notchpeak, 24 on kingspeak, 36 on ember, and 28 on lonepeak).

    The limit on the number of nodes can be exceeded with a reservation or QOS specification. Requests are evaluated on a case-by-case basis; please contact us ( to learn more.

  • Required node not available (down, drained, or reserved)
    or job has "reason code" ReqNodeNotAvail

    This occurs when a reservation is in place on one or more of the nodes requested by the job. The "Required node not available (down, drained, or reserved)" message can occur when submitting a job interactively (with srun, for instance); when submitting a script (often with sbatch), however, the job will enter the queue without complaint and Slurm will assign it the "reason code" (which provides some insight into why the job has not yet started) "ReqNodeNotAvail."

    The presence of a reservation on a node likely means it is in maintenance. It is possible there is a downtime on the cluster in question; please check the news page and subscribe to the mailing list so you will be notified of impactful maintenance periods.

I would like to change my shell (to bash or tcsh)

You can change your shell in the Edit Profile page by selecting the shell you'd like and clicking "Change." This change should take effect within fifteen minutes and you will need to log in again on any resources you were using at the time. If you only need to use a different shell (but don't want to change your own), you can open the shell or pass commands as arguments (e.g. tcsh or tcsh -c "echo hello").

I would like to change my email address

You can change the email address CHPC uses to contact you in the Edit Profile page .

 My program crashed because /tmp filled up

Linux defines temporary file system at /tmp or /var/tmp where temporary user and system files are stored. CHPC cluster nodes set up temporary file systems as a RAM disk with limited capacity. All interactive and compute nodes have also a spinning disk local storage at /scratch/local. If an user program is known to need temporary storage, it is advantageous to set environment variable TMPDIR which defines the location of the temporary storage and point it to /scratch/local. Or, even better, create an user specific directory, /scratch/local/$USER, and set /scratch/local to that as shown in our sample/uufs/[csh,sh] script.

I am getting a message "Disk quota exceeded" when logging in

Default CHPC home directories have a 50 GB storage limit, which when exceeded does not allow one to write any more files to the home directory. As some access tools like FastX rely on storing small files in user's home directory upon logging in, it will fail.

To display quota information either runquota -u $USERcommand or log to CHPC user personal details and scroll down to Filesystem Quotas.

The remedy for this is to clean up files in the home. To do that, log in using a terminal tool, such as putty or Git bash on Windows, or terminal on a Mac. Delete large files (using the rm command). You may be also able to scp large files back to your desktop with WinSCP (Windows) or Cyberduck (Mac). To keep large files, explore other storage solutions at CHPC.

To find large files, in the text terminal, run du -h -d 1|sort -h command in your home directory to show disk space used per directory (the largest directory will be at the bottom). Then cd to the directory with the largest usage and continue till you find the largest files. If you clean a few files and are able to open FastX session again, run graphical tool baobab which sorts the directories by their size and makes it easier to find all the potentially useless large files.

Last Updated: 3/12/19