Skip to content

User installed Python

As Python libraries evolve rapidly and may have specific dependencies, it is becoming increasingly difficult to support the Python distribution centrally. Therefore we encourage users to maintain their own Python stack as described below.


 Why are we moving away from a central Python installation?

The Python ecosystem is growing rapidly and it has become difficult to keep centrally maintained Python distributions up to date. Furthermore, some Python modules depend on specific versions of modules which may be incompatible with others. Finally, user space Python distributions, and specifically Anaconda/Miniconda, are actively incorporating peformance improvements and are comparable or better to hand tuned Python builds.

For these reasons we are deprecating centrally maintained Python distributions and encourage users to maintain their own Python stack as described below.

However, please, be aware that there are some corner cases when Anaconda/Miniconda Python library stack is difficult to install, mostly when there are conflicts between dependent libraries. In those cases, we recommend to research if the particular stack is offered in a form of a Docker container, which can be imported and loaded in our HPC environment using Singularity.

User space Python choices

Miniconda is a minimal Anaconda distribution, which ships with base Python and the conda package manager. This makes the base installation rather small, at 0.3 GB. Additional packages need to be installed manually (described below). Its small base size and selectivity makes it our choice for user space installation.

Micromambais a tiny version of the mamba package manager. It is a statically linked C++ executable with a separate command line interface. It does not need a base environment and does not come with a default version of Python. It is a good candidate for packaging the whole Python environment in a container, as described below.

Anaconda is the most popular Python distribution. It is well optimized and ships with the Intel MKL for fast and threaded numerical calculations. It also comes with a package manager system, conda, and includes many commonly used Python modules. For this reason it occupies 3.2 GB as installed, which is a sizeable amount given our default 50 GB home directory quota. For this reason it is not our top choice.

Intel Distribution for Python is provided by Intel with performance similar to Anaconda. Also similarly to Anaconda, it includes select numerical modules. It can be either installed as a standalone or using the conda package manager.

Miniconda installation and usage

Miniconda installation

Download the Miniconda installer using the wget command and run the installer, pointing it to the directory where you want to install it. We recommend $HOME/software/pkg/miniconda3 for easy integration into user defined environment modules.

wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash ./Miniconda3-latest-Linux-x86_64.sh -b -p $HOME/software/pkg/miniconda3 -s

The flag '-b' forces unattended installation, which among other things does not add Miniconda to your default environment - we will do it in the next step via environment modules. The '-p' marks the installation directory. The '-s' will not automatically set up your environment to use this miniconda - we will do this in the next section using the environment module.

Miniconda environment module

To easily set up the Miniconda environment, create an user environment module. First create a directory where the user environment module hierarchy will reside, and then copy our miniconda module file to this directory.

mkdir -p $HOME/MyModules/miniconda3
cp /uufs/chpc.utah.edu/sys/installdir/python/modules/miniconda3/latest.lua $HOME/MyModules/miniconda3

The user module environment must be loaded into the default module environment with the module use command. After that, we can load the user space miniconda module.

Note that if the miniconda is installed in a different location than $HOME/software/pkg/miniconda3 , the module file ( latest.lua ) should be named differently to indicate what kind of miniconda installation this is, and the module file should be edited to set the local myanapath variable to the full path where this miniconda installation is.

module use $HOME/MyModules
module load miniconda3/latest

To make the user module environment available in all your future sessions, edit ~/.custom.csh (for tcsh shell) or ~/.custom.sh (for bash shell) and insert the module use command just below the #!/bin/tcsh or #!/bin/bash. Do not put the module load miniconda3   command in these files since that is known to break the remote connections using FastX.

Conda package manager basics

The conda package manager is recommended for maintaining your user Miniconda Python distribution. Take look at and use the Conda cheat sheet,  which lists the most commonly used commands. More detailed documentation is in the Conda User Guide.

Installing additional Python packages

Miniconda only comes with a basic Python distribution. Therefore one needs to install the needed Python modules. You can list the currently installed Python modules as follows:

To install a new package, run

conda install [packagename]

For example, to install the latest version of the SciPy module, run

conda install scipy

The installer will think for a little while and then install SciPy and a number of other packages on which SciPy depends, such as NumPy and the accelerated Intel Math Kernel Library (MKL). This will cause the Miniconda distribution to grow to about 1.5 GB, but, it will include all the packages needed for high performance numerical analysis with SciPy.

NOTE: Since the MKL library by defaults utilizes all the processor cores on the system, if you are planning to run many independent parallel Python calculations, set the environment variable OMP_NUM_THREADS=1 (setenv OMP_NUM_THREADS 1 for tcsh or export OMP_NUM_THREADS=1 for bash).

Another common Python package is Jupyter, which allows one to run Jupyter notebooks. This can be installed as:

conda install jupyter

To uninstall a conda package run

conda uninstall [packagename]

Conda packages in other channels

If the conda install command can not find the requested package, chances are high it will be in a non-default conda repository (channel). Independent distributors can create their own package channels that house their products. The best approach to find a package which is not in an official conda channel is to do a websearch for it.

For example, to look for a package named Fasta, which is used for biological sequence alignment, we web search for "anaconda fasta". Top search hit suggests the following installation line:

conda install -c biobuilds fasta

The "-c" option specifies the channel name, in our case biobuilds.

To add a channel to the default channel list, we can:

conda config--addchannels biobuilds

However this puts a channel at the top of the list, with the highest priority. We can add a new channel to the bottom of the list instead in the following way:

conda config --append channels biobuilds

Installing Python modules with pip

When a Python package does not exist as a conda package, one can use the Python pip installer. We recommend using pip only as a last resort since this way one loses the flexibility of the conda packaging environment (automatic conflict resolution and version upgrade/downgrade). To install a module using pip, either run:

python -m pip install bibtexparser

or, preferably:

pip install bibtexparser

For other ways how to install Python packages, see our older document.

Miniconda Python environments

Miniconda supports Python virtual environments (VEs). Therefore one can leverage multiple Python instances from a single Miniconda installation. As of conda 4.6, released at the end of June 2019, both bash and tcsh shells are supported provided miniconda module presented above is used.

We can list existing environments with

conda env list

Assuming one uses bash shell, we can for example install the Intel distribution for Python into a separate environment:

conda update -y conda
conda create -c intel -n idp3 intelpython3_core python

This updates conda ("-y" answers Yes to question what packages we want to update), appends the intel conda channel to our channel list, and creates an environment named "idp3" based on the  intelpython3_core package.

We then activate the environment as:

conda activate idp3

All conda package commands can be used within the activated environment, e.g. newly installed conda packages will then be installed in this environment using the conda install command.

To exit from the environment, run:

conda deactivate

While the virtual environments provide convenient way to install different Python modules, their versions and dependencies, we occasionally see conflicts that are hard to figure out and resolve. Furthermore, it is more complicated to wrap the VEs into the Lmod modules and Open Ondemand Jupyter notebooks. For that reason we recommend to use different miniconda installations instead of using virtual environments. For each independent miniconda installation, modify the miniconda module name (e.g. cp latest.lua mynewconda.lua), and in the new module file, specify the path where this particular miniconda is installed in the myanapath variable.

Micromamba in a container installation and use

Installing the whole conda environment in a container has several benefits. It's packaged in a single file so it's easy to move it somewhere else. The whole environment is static (fixed during the build of the container), so, it won't get accidentally changed when trying to install or update a package - updates require building a new container. While it is equaly possible to create a container based on miniconda, the Micromamba installation is smaller, and is using the more performant mamba package manager instead of conda. Below we outline steps in building a Micromamba container with some bioinformatics tools and a Jupyter Notebook and using this environment on CHPC's Open OnDemand Jupyter app.

Creating a micromamba container

We will use Apptainer to create the container, which has been recently made possible to be done as an user. First, we create a recipe file for building the container, and name it Singularity :

Bootstrap: docker
From: mambaorg/micromamba

%post
    micromamba install --yes --name base -c bioconda -c conda-forge \
    python=3.9.1 notebook samtools bwa
    micromamba clean -aqy

%runscript
  micromamba run -p /opt/conda "$@"

In this recipe, we are pulling the Micromamba container from DockerHub, installing the needed tools in the post section, and executing the micromamba run ... command whenever the container is executed. Now we build the container, as:

module load apptainer
unset APPTAINER_BINDPATH
apptainer build mymamba.sif Singularity

Unsetting the APPTAINER_BINDPATH is necessary to avoid a build error that complains about missing mount points in the container. This environment variable ensures /scratch and /uufs get mounted automatically when the container is executed. Note that in this article we are using the bash shell syntax for setting and unsetting the environment variables.

The container sif file has executable permissions, so we can run the container directly, following the command we want to run from the container:

$ ./mymamba_jup.sif bwa 
Program: bwa (alignment via Burrows-Wheeler transformation)
Version: 0.7.17-r1188
...

Python environment packages can be also specified by file environment.yml , e.g. 

channels:
-defaults
-conda-forge
dependencies:
-matplotlib
-python=3.9
-pip

In that case, we can modify the micromamba install command as:

micromamba create --yes --name base --file environment.yml

Micromamba GPU container

For the GPU container, which as an example installs PyTorch environment, one has to use the --nv flag, which imports the GPU stack from the host into the container, during the build to ensure that the mamba package manager picks up the GPU/CUDA dependencies and installs the GPU version of PyTorch.

 module load apptainer
 unset APPTAINER_BINDPATH
 apptainer build --nv mymamba_gpu.sif Singularity.gpu

To test that the GPU version if PyTorch is installed, we use the environment variable APPTAINER_NV as an alternative to --nv flag, so that we can run using directly the container file and include the GPU environment:

module load apptainer
export APPTAINER_NV=true
./mymamba_gpu.sif /opt/conda/bin/python -c "import torch; print(torch.cuda.is_available())"
True

Using the micromamba container in Open OnDemand

Just as we have run the bwa command above, we can also run the jupyter notebook command to start the Jupyter. However, this will start the Jupyter on the machine where we run the terminal and we'd be required to create an SSH tunnel to this machine so we can open this Jupyter in our client's web browser. The Open OnDemand Jupyter app launches Jupyter directly in the client's browser. To run our container, we choose the "Custom (Environment setup below)" option for the "Jupyter Python version", and in the "Environment Setup for Custom Python" text box, put:

shopt -s expand_aliases
module load apptainer
alias jupyter="$HOME/containers/mymamba.sif jupyter"

The first command is a bash option to enable aliases in shell script. Then we load the Apptainer module, followed by creating an alias for the jupyter command to call it from the container instead. This jupyter alias is then passed to the Open OnDemand Jupyter app and launches the Jupyter server. Notice that we use full path to the container, so the container is found, as the OpenOnDemand app starts at the root of user's home directory.

As noted above, if we need to use GPUs, we need to add the environment variable APPTAINER_NV=true to initialize the GPUs in the container:

export APPTAINER_NV=true
alias jupyter="$HOME/containers/mymamba_gpu.sif jupyter"

Miniconda installation examples

Interactive machine learning environment with Tensorflow, Keras and Jupyter Lab

Tensorflow is a widely used deep learning framework. Keras is an add-on to the framework that attempts to make it more user friendly. Jupyter Lab allows to run Jupyter notebooks, e.g. in our Open Ondemand web portal.

Since Tensorflow performs the best on the GPUs, we will be installing the GPU version. Once the Miniconda3 and its module are installed, look at the Tensorflow installation requirements to note the CUDA and CUDNN versions that the latest Tensorflow requires. As of this writing (December 2021), Tensorflow 2.5 requires CUDA 11.2 and CUDNN 8.1.

  • Load the CUDA and CUDNN modules, and the newly installed Miniconda module (named tf25.lua)
    ml cuda/11.2 cudnn/8.1.1
    ml use $HOME/MyModules
    ml miniconda3/tf25
  • Install the Jupyter Lab.
    conda install jupyterlab
  • Install Tensorflow. Note that we are using pip, not conda, since Google provides its tensorflow builds in pip repositories, and the conda repositories don't have all the versions. This pip installed Tensorflow also includes Keras. Test that Tensorflow can access the GPU(s).
    pip install tensorflow==2.5
    python -c "from tensorflow.python.client import device_lib; print(device_lib.list_local_devices())"
  • In the OpenOnDemand Jupyter Lab app launch window, put the following in the Environment Setup:
    ml use $HOME/MyModules
    ml cuda/11.2 cudnn/8.1.1
    ml miniconda3/tf25
    Note that in order to use GPU with Tensorflow, you need to request a GPU.

Parallel machine learning environment with Tensorflow, Keras and Horovod

Tensorflow is a widely used deep learning framework. Keras is an add-on to the framework that attempts to make it more user friendly. Horovod allows for easy and flexible distributed parallelization of Keras/Tensorflow. Installing these packages and their dependencies form a nice but rather complicated example of using Miniconda environment.

NOTE: These instructions are old so likely may not work exactly as described.

There are a few constraints:

  • We use the defaults conda channel explicitly to install Tensorflow. Other channels, e.g. intel, do not supply the GPU version.
  • horovod requires to be built with MPI and does not appear to be on anaconda channels.

The steps to set up and test this environment are as follows (valid as of November 2018):

  • Install base Miniconda as shown above
  • Prepare the right environment, load the appropriate CUDA and CUDNN for Tensorflow
    ml cuda cudnn
    							
  • Install the appropriate Python version (3.7 as of May 2020), conda based Numpy to get the accelerated MKL support and a few other packages Tensorflow needs. Make sure to use the "defaults" channel since that is the one that has the GPU binaries. Other channels, e.g. "intel", may have a preference if you have added them earlier. Check your ~/.conda.rc for the list of conda channels and their search order (from top down).
    conda install -c defaults python=3.7 numpy six wheel
  • Install tensorflow-gpu from the conda "defaults" channel and test if it can see the GPUs. If the "defaults" channel is not speficied, the "intel" channel tensorflow build is installed which does not support GPUs.
    conda install -c defaults tensorflow-gpu
    python -c "from tensorflow.python.client import device_lib; print(device_lib.list_local_devices())"
  • (NOTE - currently we have issues with MPI based install, so use the default GLOO framework) Install Horovod with pip. Load gcc/6.4.0, Tensorflow requires gcc >= 5.4.0 while the OS default is 4.8.5. (TODO - install NCCL for GPU communication, now it's using only GLOO)
    ml gcc/6.4.0
    pip install horovod
  • Check what parameters was Horovod built with
horovodrun --check-build
...
Available Controllers:
    [ ] MPI
    [X] Gloo

Available Tensor Operations:
    [ ] NCCL
    [ ] DDL
    [ ] CCL
    [ ] MPI
    [X] Gloo    
  • To run a parallel calculation on multiple GPUs, make sure to use a SLURM shell script rather than interactive job - since interactive job on GPU nodes using gres does not work with mpirun
    #!/bin/bash
    #SBATCH -N 1
    #SBATCH -n 4
    #SBATCH -A owner-gpu-guest
    #SBATCH -p notchpeak-gpu-guest
    #SBATCH -t 2:00:00
    #SBATCH --gres=gpu:titanv:4
    #SBATCH --mem=0
    ml use $HOME/MyModules
    ml cuda cudnn
    ml miniconda3/latest
    ml gcc/6.4.0
    horovodrun -np 4 python tensorflow2_mnist.py

Setting up miniconda for others to use

Sometimes a researcher needs to install a Miniconda package, that they want their co-workers to use as well. This helps with reproducibility, as the same software stack is used by all the researchers. To achieve this, we follow the Miniconda installation and usage instructions, but with a few modifications outlined below.

First, we install the Miniconda to a different location, which name corresponds to a package that we want to install. For example, if we are to install a Miniconda environment with the BLAST+ bioinformatics package, we name the directory  miniconda3-blast .

wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash ./Miniconda3-latest-Linux-x86_64.sh -b -p $HOME/software/pkg/miniconda3-blast -s

Next we create the module file. We should name it along the same lines as the package name, for example  miniconda3/blast . The blast  part of the module name is in place of the usual program version, which is fine, Lmod handles that. We get the module template from CHPC as follows:

mkdir -p $HOME/MyModules/miniconda3
cp /uufs/chpc.utah.edu/sys/installdir/python/modules/miniconda3/latest.lua $HOME/MyModules/miniconda3/blast.lua

At this point, the module file is the same as CHPC's template so it points to the miniconda in the owner's  HOME . Other user's HOME  is different, therefore we need to modify the new module file to include the absolute path to the location of this new miniconda3-blast installation. Open the module file in a text editor, and look at lines 6-11:

-- change myanapath if the installation is in a different place in your home
-- note that this is a relative path from the base of your home directory
local myanapath ="software/pkg/miniconda3"
-- if you want to share this miniconda installation with others, use the full path
--local myanapath = "/uufs/chpc.utah.edu/common/home/u0123456/software/pkg/miniconda3"

Notice that the  myanapath  is a relative path, not an absolute one. The HOME variable will be added to this relative path later in the module file. Since each user has their own unique HOME, this prevents other users from using this module file to set up our miniconda environment - the miniconda is installed in or HOME, not theirs. To fix this, we have to put the absolute path to our new minconda in the myanapath variable, as follows:

-- change myanapath if the installation is in a different place in your home
-- note that this is a relative path from the base of your home directory
-- local myanapath ="software/pkg/miniconda3"
-- if you want to share this miniconda installation with others, use the full path
local myanapath = "/uufs/chpc.utah.edu/common/home/u0123456/software/pkg/miniconda3-blast"

Later in the module file the logic recognizes this and keeps this absolute path.

Once we save the module file,we are ready to use it to initialize the miniconda environment:

module use $HOME/MyModules
module load miniconda3/blast

Other users can do the same, except that they have to use the full path to the HOME:

module use /uufs/chpc.utah.edu/common/home/u0123456/MyModules
module load miniconda3/latest

Once the miniconda3/blast module is loaded, we can run the conda  commands to install the needed packages, e.g. for BLAST:

conda install -c bioconda blast

and then make it available to co-workers.

Please, note that we recommend not to install Conda or Python Virtual Environments (VEs) into the module based Miniconda installations. A more robust approach is to create multiple Miniconda installations with corresponding modules, and into these Minicondas install the packages that would otherwise be installed in the Virtual Environments. There are several advantages to this approach. First, the packages installed into the separate Minicondas are completely independent from each other, which alleviates possible dependency and versioning problems between the base Conda environment and the Virtual Environment. Second, the tcsh shell has been historically less supported with the VEs, creating problems for tcsh users. Finally, the modules can be freely loaded and unloaded, allowing to completely clear the environment, which is more problematic with the VEs, as there is always the base environment that remains.

Last Updated: 1/26/24