You are here:

User installed Python

As Python libraries evolve rapidly and may have specific dependencies, it is becoming increasingly difficult to support the Python distribution centrally. Therefore we encourage users to maintain their own Python stack as described below.


 Why are we moving away from a central Python installation?

The Python ecosystem is growing rapidly and it has become difficult to keep centrally maintained Python distributions up to date. Furthermore, some Python modules depend on specific versions of modules which may be incompatible with others. Finally, user space Python distributions, and specifically Anaconda/Miniconda, are actively incorporating peformance improvements and are comparable or better to hand tuned Python builds.

For these reasons we are deprecating centrally maintained Python distributions and encourage users to maintain their own Python stack as described below.

User space Python choices

Miniconda is a minimal Anaconda distribution, which ships with base Python and the conda package manager. This makes the base installation rather small, at 0.3 GB. Additional packages need to be installed manually (described below). Its small base size and selectivity makes it our choice for user space installation.

Anaconda is the most popular Python distribution. It is well optimized and ships with the Intel MKL for fast and threaded numerical calculations. It also comes with a package manager system, conda, and includes many commonly used Python modules. For this reason it occupies 3.2 GB as installed, which is a sizeable amount given our default 50 GB home directory quota. For this reason it is not our top choice.

Intel Distribution for Python is provided by Intel with performance similar to Anaconda. Also similarly to Anaconda, it includes select numerical modules. It can be either installed as a standalone or using the conda package manager.

Miniconda installation and usage

Miniconda installation

Download the Miniconda installer using the wget command and run the installer, pointing it to the directory where you want to install it. We recommend $HOME/software/pkg/miniconda3 for easy integration into user defined environment modules.

wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash ./Miniconda3-latest-Linux-x86_64.sh -b -p $HOME/software/pkg/miniconda3

The flag '-b' forces unattended installation, and '-p' marks the installation directory.

Miniconda environment module

To easily set up the Miniconda environment, create an user environment module. First create a directory where the user environment module hierarchy will reside, and then copy our miniconda module file to this directory.

mkdir -p $HOME/MyModules/miniconda3
cp /uufs/chpc.utah.edu/sys/installdir/python/modules/miniconda3/latest.lua $HOME/MyModules/miniconda3

The user module environment must be loaded into the default module environment with the module use command. After that, we can load the user space miniconda module.

module use $HOME/MyModules
module load miniconda3/latest

To make the user module environment available in all your future sessions, edit ~/.custom.csh (for tcsh shell) or ~/custom.sh (for bash shell) and insert the module use command just below the #!/bin/tcsh or #!/bin/bash.

Conda package manager basics

The conda package manager is recommended for maintaining your user Miniconda Python distribution. Take look at and use the Conda cheat sheet,  which lists the most commonly used commands. More detailed documentation is in the Conda User Guide.

Installing additional Python packages

Miniconda only comes with a basic Python distribution. Therefore one needs to install the needed Python modules. You can list the currently installed Python modules as follows:

conda list

To install a new package, run

conda install [packagename]

For example, to install the latest version of the SciPy module, run

conda install scipy

The installer will think for a little while and then install SciPy and a number of other packages on which SciPy depends, such as NumPy and the accelerated Intel Math Kernel Library (MKL). This will cause the Miniconda distribution to grow to about 1.5 GB, but, it will include all the packages needed for high performance numerical analysis with SciPy.

NOTE: Since the MKL library by defaults utilizes all the processor cores on the system, if you are planning to run many independent parallel Python calculations, set the environment variable OMP_NUM_THREADS=1 (setenv OMP_NUM_THREADS 1 for tcsh or export OMP_NUM_THREADS=1 for bash).

 To uninstall a conda package run

conda uninstall [packagename]

Conda packages in other channels

If the conda install command can not find the requested package, chances are high it will be in a non-default conda repository (channel). Independent distributors can create their own package channels that house their products. The best approach to find a package which is not in an official conda channel is to do a websearch for it.

For example, to look for a package named Fasta, which is used for biological sequence alignment, we web search for "anaconda fasta". Top search hit suggests the following installation line:

conda install -c biobuilds fasta

The "-c" option specifies the channel name, in our case biobuilds.

To add a channel to the default channel list, we can:

conda config--addchannels biobuilds

However this puts a channel at the top of the list, with the highest priority. We can add a new channel to the bottom of the list instead in the following way:

conda config --append channels biobuilds

Installing Python modules with pip

When a Python package does not exist as a conda package, one can use the Python pip installer. We recommend using pip only as a last resort since this way one loses the flexibility of the conda packaging environment (automatic conflict resolution and version upgrade/downgrade). To install a module using pip, either run:

python -m pip install bibtexparser

or, preferably:

pip install bibtexparser

For other ways how to install Python packages, see our older document.

Miniconda Python environments

Miniconda supports Python virtual environments. Therefore one can leverage multiple Python instances from a single Miniconda installation. However, virtual environments are not supported in the tcsh shell, which implies that tcsh users will have to install different Miniconda distributions if they need multiple Python stacks due to module conflicts.

We can list existing environments with

conda env list

Assuming one uses bash shell, we can for example install the Intel distribution for Python into a separate environment:

conda update -y conda
conda create -c intel -n idp3 intelpython3_core python

This updates conda ("-y" answers Yes to question what packages we want to update), appends the intel conda channel to our channel list, and creates an environment named "idp3" based on the  intelpython3_core package.

We then activate the environment as:

source activate idp3

All conda package commands can be used within the activated environment, e.g. newly installed conda packages will then be installed in this environment using the conda install command.

To exit from the environment, run:

source deactivate

Examples

Parallel machine learning environment with Tensorflow, Keras and Horovod

Tensorflow is a widely used deep learning framework. Keras is an add-on to the framework that attempts to make it more user friendly. Horovod allows for easy and flexible distributed parallelization of Keras/Tensorflow. Installing these packages and their dependencies form a nice but rather complicated example of using Miniconda environment.

There are a few constraints:

  • based on our experience Tensorflow installed with conda does not automatically recognize the GPUs. Therefore we will install it with pip.
  • horovod requires to be built with MPI and does not appear to be on anaconda channels.

The steps to set up and test this environment are as follows (valid as of November 2018):

  • Install base Miniconda as shown above
  • Prepare the right environment, load the appropriate CUDA and CUDNN for Tensorflow and MPI for Horovod
    ml cuda/9.0.176 cudnn
    ml gcc mpich 
  • Downgrade Python and install conda based Numpy to get the accelerated MKL support and a few other packages Tensorflow needs. Make sure to use the "defaults" channel since (as of Jan 2019), conda picks up the "intel" channel by default
    conda install -c defaults python=3.6 numpy six wheel
  • Install tensorflow-gpu from the conda "defaults" channel and test if it can see the GPUs. If the "defaults" channel is not speficied, the "intel" channel tensorflow build is installed which does not support GPUs.
    conda install -c defaults tensorflow-gpu
    python -c "from tensorflow.python.client import device_lib; print(device_lib.list_local_devices())"
  • Install Horovod with pip. If it crashes make sure an MPI module is loaded
    pip install horovod
  • To run a MPI parallel calculation on multiple GPUs, make sure to use a SLURM shell script rather than interactive job - since interactive job on GPU nodes using gres does not work with mpirun
    #!/bin/bash
    #SBATCH -N 1
    #SBATCH -n 4
    #SBATCH -A owner-gpu-guest
    #SBATCH -p notchpeak-gpu-guest
    #SBATCH -t 2:00:00
    #SBATCH --gres=gpu:titanv:4
    #SBATCH --mem=0
    ml use $HOME/MyModules
    ml cuda/9.0.176 cudnn
    ml miniconda3/latest
    ml gcc mpich
    mpirun -np 4 python keras_mnist.py

 

Last Updated: 2/28/19