User installed Python
As Python libraries evolve rapidly and may have specific dependencies, it is becoming increasingly difficult to support the Python distribution centrally. Therefore we encourage users to maintain their own Python stack as described below.
- Why are we moving away from a central Python installation?
- Miniconda installation and usage
- Installation examples
Why are we moving away from a central Python installation?
The Python ecosystem is growing rapidly and it has become difficult to keep centrally maintained Python distributions up to date. Furthermore, some Python modules depend on specific versions of modules which may be incompatible with others. Finally, user space Python distributions, and specifically Anaconda/Miniconda, are actively incorporating peformance improvements and are comparable or better to hand tuned Python builds.
For these reasons we are deprecating centrally maintained Python distributions and encourage users to maintain their own Python stack as described below.
Miniconda is a minimal Anaconda distribution, which ships with base Python and the conda package manager. This makes the base installation rather small, at 0.3 GB. Additional packages need to be installed manually (described below). Its small base size and selectivity makes it our choice for user space installation.
Anaconda is the most popular Python distribution. It is well optimized and ships with the Intel MKL for fast and threaded numerical calculations. It also comes with a package manager system, conda, and includes many commonly used Python modules. For this reason it occupies 3.2 GB as installed, which is a sizeable amount given our default 50 GB home directory quota. For this reason it is not our top choice.
Intel Distribution for Python is provided by Intel with performance similar to Anaconda. Also similarly to Anaconda, it includes select numerical modules. It can be either installed as a standalone or using the conda package manager.
Miniconda installation and usage
Download the Miniconda installer using the wget command and run the installer, pointing
it to the directory where you want to install it. We recommend
$HOME/software/pkg/miniconda3 for easy integration into user defined environment modules.
bash ./Miniconda3-latest-Linux-x86_64.sh -b -p $HOME/software/pkg/miniconda3
The flag '-b' forces unattended installation, and '-p' marks the installation directory.
To easily set up the Miniconda environment, create an user environment module. First create a directory where the user environment module hierarchy will reside, and then copy our miniconda module file to this directory.
mkdir -p $HOME/MyModules/miniconda3
cp /uufs/chpc.utah.edu/sys/installdir/python/modules/miniconda3/latest.lua $HOME/MyModules/miniconda3
The user module environment must be loaded into the default module environment with
module use command. After that, we can load the user space miniconda module.
module use $HOME/MyModules
module load miniconda3/latest
To make the user module environment available in all your future sessions, edit
~/.custom.csh (for tcsh shell) or ~/custom.sh (for bash shell) and insert the
module use command just below the
The conda package manager is recommended for maintaining your user Miniconda Python distribution. Take look at and use the Conda cheat sheet, which lists the most commonly used commands. More detailed documentation is in the Conda User Guide.
Miniconda only comes with a basic Python distribution. Therefore one needs to install the needed Python modules. You can list the currently installed Python modules as follows:
To install a new package, run
conda install [packagename]
For example, to install the latest version of the SciPy module, run
conda install scipy
The installer will think for a little while and then install SciPy and a number of other packages on which SciPy depends, such as NumPy and the accelerated Intel Math Kernel Library (MKL). This will cause the Miniconda distribution to grow to about 1.5 GB, but, it will include all the packages needed for high performance numerical analysis with SciPy.
NOTE: Since the MKL library by defaults utilizes all the processor cores on the system,
if you are planning to run many independent parallel Python calculations, set the
environment variable OMP_NUM_THREADS=1 (
setenv OMP_NUM_THREADS 1 for tcsh or
export OMP_NUM_THREADS=1 for bash).
To uninstall a conda package run
conda uninstall [packagename]
conda install command can not find the requested package, chances are high it will be in a non-default
conda repository (channel). Independent distributors can create their own package
channels that house their products. The best approach to find a package which is not
in an official conda channel is to do a websearch for it.
For example, to look for a package named Fasta, which is used for biological sequence alignment, we web search for "anaconda fasta". Top search hit suggests the following installation line:
conda install -c biobuilds fasta
The "-c" option specifies the channel name, in our case
To add a channel to the default channel list, we can:
conda config--addchannels biobuilds
However this puts a channel at the top of the list, with the highest priority. We can add a new channel to the bottom of the list instead in the following way:
conda config --append channels biobuilds
When a Python package does not exist as a conda package, one can use the Python
pip installer. We recommend using
pip only as a last resort since this way one loses the flexibility of the conda packaging
environment (automatic conflict resolution and version upgrade/downgrade). To install
a module using
pip, either run:
python -m pip install bibtexparser
pip install bibtexparser
For other ways how to install Python packages, see our older document.
Miniconda supports Python virtual environments. Therefore one can leverage multiple Python instances from a single Miniconda installation. However, virtual environments are not supported in the tcsh shell, which implies that tcsh users will have to install different Miniconda distributions if they need multiple Python stacks due to module conflicts.
We can list existing environments with
conda env list
Assuming one uses bash shell, we can for example install the Intel distribution for Python into a separate environment:
conda update -y conda
conda create -c intel -n idp3 intelpython3_core python
This updates conda ("-y" answers Yes to question what packages we want to update),
appends the intel conda channel to our channel list, and creates an environment named
"idp3" based on the
We then activate the environment as:
source activate idp3
All conda package commands can be used within the activated environment, e.g. newly
installed conda packages will then be installed in this environment using the
conda install command.
To exit from the environment, run:
Tensorflow is a widely used deep learning framework. Keras is an add-on to the framework that attempts to make it more user friendly. Horovod allows for easy and flexible distributed parallelization of Keras/Tensorflow. Installing these packages and their dependencies form a nice but rather complicated example of using Miniconda environment.
There are a few constraints:
- based on our experience Tensorflow installed with conda does not automatically recognize the GPUs. Therefore we will install it with pip.
- horovod requires to be built with MPI and does not appear to be on anaconda channels.
The steps to set up and test this environment are as follows (valid as of November 2018):
- Install base Miniconda as shown above
- Prepare the right environment, load the appropriate CUDA and CUDNN for Tensorflow
and MPI for Horovod
ml cuda/9.0.176 cudnn ml gcc mpich
- Downgrade Python and install conda based Numpy to get the accelerated MKL support
and a few other packages Tensorflow needs. Make sure to use the "defaults" channel
since (as of Jan 2019), conda picks up the "intel" channel by default
conda install -c defaults python=3.6 numpy six wheel
- Install tensorflow-gpu from the conda "defaults" channel and test if it can see the
GPUs. If the "defaults" channel is not speficied, the "intel" channel tensorflow build
is installed which does not support GPUs.
conda install -c defaults tensorflow-gpu
python -c "from tensorflow.python.client import device_lib; print(device_lib.list_local_devices())"
- Install Horovod with pip. If it crashes make sure an MPI module is loaded
pip install horovod
- To run a MPI parallel calculation on multiple GPUs, make sure to use a SLURM shell
script rather than interactive job - since interactive job on GPU nodes using gres
does not work with mpirun
#SBATCH -N 1
#SBATCH -n 4
#SBATCH -A owner-gpu-guest
#SBATCH -p notchpeak-gpu-guest
#SBATCH -t 2:00:00
ml use $HOME/MyModules
ml cuda/9.0.176 cudnn
ml gcc mpich
mpirun -np 4 python keras_mnist.py