You are here:

Research Computing Frequently Asked Questions

  1. I can't ssh to machine anymore, getting a serious looking error that starts with
    @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
    @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
    @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
  2. My calculations, or other file operations complain that the file can't be accessed, or it does not exist, even though I have just created or modified it.
  3. Starting Emacs editor is very slow.
  4. Opening Emacs file is very slow.
  5. When submitting a SLURM job, I am getting an error like:
    sbatch: error: Batch job submission failed: Invalid account or
    account/partition combination specified
  6. I would like to change my shell (to bash or tcsh)

 

I can't ssh to machine anymore, getting a serious looking error that starts with

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

The full error looks like this:

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle
attack)!
It is also possible that a host key has just been changed.
The fingerprint for the RSA key sent by the remote host is
23:92:bc:12:4c:fe:08:2e:a8:48:af:08:a9:3c:93:6f.
Please contact your system administrator.
Add correct host key in /your_home/.ssh/known_hosts to get rid of
this message.
Offending RSA key in /your_home/.ssh/known_hosts:21
RSA host key for ember.chpc.utah.edu has changed and you have requested
strict checking.
Host key verification failed.

While looking scary, this error is usually benign. It occurs when the SSH keys on the machine you are trying to connect to change, most commonly with operation system upgrade. There are two ways to get rid of this message and log in:

  1. open file ~/.ssh/known_hosts in a text editor and delete the lines that contain the host name you are connecting to                                      
  2. use ssh-keygen  command with -R flag to remove the ssh keys for the given host
    e.g. ssh-keygen -R ember.chpc.utah.edu

On the subsequent ssh connection to the machine says something like:

The authenticity of host 'ember.chpc.utah.edu (155.101.26.21)' can't be established. 
RSA key fingerprint is cd:9b:4c:f3:88:4d:6a:e2:34:4b:96:83:37:04:ed:9e.
Are you sure you want to continue connecting (yes/no)?

Answer yes to that and you will be prompted for a password and can log in.


 My calculations, or other file operations complain that the file can't be accessed, or it does not exist, even though I have just created or modified it.

This error may have many incarnations but it may look something like this:

ERROR on proc 0: Cannot open input script in.npt-218K-continue (../lammps.cpp:327)

It also occurs randomly, sometimes the program works, sometimes not.

This error is most likely due to the way how the file system writes files. For performance reasons, it writes parts of the file into a memory buffer, which gets periodically written to the disk. If another machine tries to access the file before the machine that writes the file writes it to the disk, this error occurs. For NFS, which we use for all our home directories and group spaces, it is well described here. There are several ways to deal with this:

  1. Use the Linux sync command to forcefully flush the buffers to the disk. Do this both at the machine where the file writing and file reading occurs BEFORE the file is accessed. To ensure that all compute nodes in the job sync, do "srun -n $SLURM_NNODES --ntasks-per-node=1 sync ".
  2. Sometimes adding the Linux sleep command can help, to provide extra time window for the syncing to occur.
  3. Inside of the code, use fflush for C/C++ or flush for Fortran. For other languages, such as Python and Matlab, google them for "flush" to see what options are there.

If neither of these help, please, try other file system to see if the error persists (e.g. /scratch/global/lustre, or /scratch/local), and let us know.


Starting Emacs editor is very slow.

Emacs's initialization includes accessing many files, which can be slow in the network file system environment. The workaround is to run EMacs in the server mode (as a daemon), and start each terminal session using emacsclient command. The Emacs daemon stays in the background even if one disconnects from that particular system, so, it needs to be started only once per system start.

The easiest way is to create an alias for the emacs command as

alias emacs emacsclient -a \"\"

Note the escaped double quote characters (\"). This will start the emacs as a daemon if it's not started already, and proceeds to run in the client mode.

Note that by default emacsclient starts in the terminal, to force to start Emacs GUI, add "-c" flag, e.g. (assuming the aforementioned alias is in place) "emacs -c myfile.txt".


 

Opening Emacs file is very slow

We have yet to find the root of this problem but it's most likely caused by the number of files in a directory and the type of the file that Emacs is filtering through. The workaround is to read the file without any contents conversion, M-x find-file-literally <Enter> filename <Enter>. After opening the file, one can tell Emacs to encode the file accordingly, e.g. to syntax highlight shell scripts, M-x sh-mode <Enter>.

To make this change permanent, the ~/.emacs file to add:

(global-set-key "\C-c\C-f" 'find-file-literally)

When submitting a SLURM job, I am getting an error like:
sbatch: error: Batch job submission failed: Invalid account or
account/partition combination specified

 This error usually indicates that one is trying to run a job in general partition, but the research group does not have an allocation. To view current allocation status, see this page. Jobs without allocation must run in the freecycle partion. They will have lower priority and be preemptable. That is, in the SLURM job script change

#SBATCH -A mygroup
#SBATCH -p cluster

to

#SBATCH -A mygroup
#SBATCH -p cluster-freecycle

 where mygroup is your group (usually the PIs last name all in lowercase), and cluster is the cluster name (kingspeak or ember).

Running in freecycle mode is generally not recommended because the chances of preemption are quite high. There are two alternatives when one runs out of regular allocation:

    1. Use the unallocated clusters. These include "lonepeak" and "tangent", as:
      #SBATCH -A mygroup
      #SBATCH -p lonepeak # or tangent
    2. Use the owner nodes on kingspeak, ember or ash. These jobs are preemptable, but, only by the group that owns the nodes so the chances of preemption are lower, especially on those group nodes, which are not used regularly. On kingspeak or ember, use:
      #SBATCH -A owner-guest
      #SBATCH -p ember-guest # or kingspeak-gues
      and on ash do:
      #SBATCH -A smithp-guest
      #SBATCH -p ash-guest

I would like to change my shell (to bash or tcsh)

You can change your shell in the Edit Profile page by selecting the shell you'd like and clicking "Change." This change should take effect within fifteen minutes and you will need to log in again on any resources you were using at the time. If you only need to use a different shell (but don't want to change your own), you can open the shell or pass commands as arguments (e.g. tcsh or tcsh -c "echo hello").


 

Last Updated: 7/14/17