Tensorflow

Tensorflow is an open source software library for machine learning developed by Google. Its mission is to train and build neural networks. It can be used on CPU and GPU architectures. In what follows we will describe how you can install Tensorflow in your own environment on the CHPC clusters.

NOTE: The instructions below are outdated and left for legacy purposes. Please use our Parallel machine learning environment with Tensorflow, Keras and Horovod example installed inside Miniconda.

The Tensorflow package (whether or not GPU-enabled) can be installed based on the Python 2.7.11 or Python 3.5.2 distros. Both these python distros are (already) available on the CHPC clusters.

Installation of Tensorflow
Step 0: Find prerequisites
Step 1: Logging into a GPU-enabled node
Step 2: Setting up your Python Virtual Environment (based on python 3.5.2)
Step 3: Activate the newly created virtual environment
Step 4: Installation of Tensorflow in the Python Virtual Env.
Step 5: Testing of the Tensorflow installation
Step 6: Deactivation of your Virtual Environment
Appendix (Meta-installation)
Step in your Virtual Env.
Leaving your Virtual Env

Installation of Tensorflow

In what follows we presume that you are planning to use the GPU-based version of Tensorflow. In order to use the CHPC GPU nodes you need to have a CHPC account AND the permission to use CHPC's GPU nodes.

If you already have a CHPC account but do NOT have the permission to use CHPC's GPU nodes, please send an email to helpdesk@chpc.utah.edu . Please describe in the email why you would like to use CHPC's GPU nodes (the demand unfortunately surpasses the offer).

In what follows we presume that you have access to CHPC's GPU nodes. If you were planning to use a non GPU-enabled version of Tensorflow the installation can proceed on a regular CPU architecture (running on Centos7). The (small) differences in the installation are mentioned below.

Step 0: Find prerequisites

For the GPU enabled Tensorflow, we need to make sure the correct versions of CUDA and CUDNN are used. Since the Linux versions of Tensorflow are built on Ubuntu (as of March 2018), check the Installing Tensorflow on Ubuntu page for what version of CUDA and CUDNN it requires (as of March 2018, CUDA 9.0 and CUDNN 7.0).

Based on this information find if we have the appropriate modules available.

module spider cuda
...
        cuda/9.0.176
...
module spider cudnn
...
        cudnn/7.0.5
...

If the required modules are not available, please, email helpdesk@chpc.utah.edu and ask to have them installed.

Step 1: Logging into a GPU-enabled node

In the example below, the Kingspeak nodes containing NVIDIA's P100 cards (Pascal Generation) are targeted. At the CHPC, we have other GPU enabled nodes. Please have a look at CHPC's GPU/Accelerator page if you would like to know more about what the CHPC offers.

srun --nodes=1 --ntasks=14 --partition=kingspeak-gpu-guest --account=owner-gpu-guest --gres=gpu:p100:1 \
     --time=24:00:00 --mem=28GB --pty /bin/bash -l

Note that in the above command the Bash Shell was requested. If you would like to use the Tcsh/Csh Shell you need to replace '--pty /bin/bash -l' by ' --pty /bin/tcsh -l '.

If an idle NVIDIA device is available on the cluster after a few seconds your batch job will start on one of the GPU enabled compute nodes that contains the requested gpu type (gres). If you want to inspect the status of physical cards on the node, then type 'nivdia-smi -l'

[u0253283@kp359 ~]$ nvidia-smi -l
Tue Apr 25 16:27:24 2017 
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 367.48 Driver Version: 367.48 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla P100-PCIE... On | 0000:04:00.0 Off | Off |
| N/A 29C P0 25W / 250W | 0MiB / 16276MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla P100-PCIE... On | 0000:82:00.0 Off | Off |
| N/A 70C P0 167W / 250W | 14791MiB / 16276MiB | 100% Default |
+-------------------------------+----------------------+----------------------+

If you type 'echo $CUDA_VISIBLE_DEVICES', you can check whether the id of the card to which you have been granted access.

[u0253283@kp359 ~]$ echo $CUDA_VISIBLE_DEVICES
0

Make sure that the command 'echo $CUDA_VISIBLE_DEVICES' returns a non-empty string. You will NOT have access to a GPU device If an empty-string were to be returned.

Step 2: Setting up your Python Virtual Environment (based on python 3.5.2)

In order to properly set up our Python Virtual Environment, the following modules must be loaded:

[u0253283@kp359 ~]$ module load python/3.5.2
[u0253283@kp359 ~]$ module load cuda/9.0.176 cudnn/7.0.5

The first command loads the parent version of python (i.e. 3.5.2) in your environment. The new virtual environment will be a child/offshoot of the parent version (the currently loaded version of python).

The second command loads the required CUDA and CUDNN modules, determined in Step 0.

The CUDNN module (Cuda Deep Neural Network Library) only needs to be loaded if and only if you want to install and/or use a GPU enabled version of Tensorflow.

You can now setup our python virtual env. as follows:

python3 -m venv --system-site-packages $HOME/MYTF
module unload python/3.5.2

The first 2 arguments load the Python venv module. (FYI: The string array 'python3 -m venv' is a replacement for the older/deprecated 'pyvenv' command). The flag '--system-site-packages' allows the newly created virtual environment to use the Python modules that were installed in the parent distro (e.g. numpy, scipy, matplotlib, etc.). The last argument i.e. $HOME/MYTF is the directory where the new virtual environment will be installed.Note that in this approach, neither the pip3, ipython, jupyter can be invoked.

If you are planning to use Tensorflow within the Jupyter Notebook, you need to omit the '--system-site-packages' flag. Therefore, if you want to use numpy, scipy, jupyter, etc. you need to install them explicitly in your virtual environment. You can do this in the following way:

pip3 install jupyter
pip3 install numpy
pip3 install scipy
etc.

Step 3: Activate the newly created virtual environment

As soon as the newly created virtual environment has become active a GPU enabled version of Tensorflow can be installed within it.

cd $HOME/MYTF
source bin/activate

The above commands activate your virtual environment. Your prompt will change as well (indicating that you are in the new environment)

(MYTF) [u0253283@kp359 MYTF]$

The new version of python3 clearly reflects this:

(MYTF) [u0253283@kp359 MYTF]$ which python3
~/MYTF/bin/python3

Step 4: Installation of Tensorflow in the Python Virtual Env.

The GPU based version of Tensorflow can now be properly installed as follows:

(MYTF) [u0253283@kp359 MYTF]$ python3 -m pip install tensorflow-gpu
Requirement already satisfied: tensorflow-gpu in ./lib/python3.5/site-packages
Requirement already satisfied: six>=1.10.0 in /uufs/chpc.utah.edu/sys/installdir/python/3.5.2-c7/lib/python3.5/site-packages/six-1.10.0-py3.5.egg (from tensorflow-gpu)
Requirement already satisfied: werkzeug>=0.11.10 in /uufs/chpc.utah.edu/sys/installdir/python/3.5.2-c7/lib/python3.5/site-packages/Werkzeug-0.11.15-py3.5.egg (from tensorflow-gpu)
Requirement already satisfied: wheel>=0.26 in /uufs/chpc.utah.edu/sys/installdir/python/3.5.2-c7/lib/python3.5/site-packages/wheel-0.30.0a0-py3.5.egg (from tensorflow-gpu)
Requirement already satisfied: numpy>=1.11.0 in /uufs/chpc.utah.edu/sys/installdir/python/3.5.2-c7/lib/python3.5/site-packages/numpy-1.12.1-py3.5-linux-x86_64.egg (from tensorflow-gpu)
Requirement already satisfied: protobuf>=3.2.0 in ./lib/python3.5/site-packages (from tensorflow-gpu)
Requirement already satisfied: setuptools in /uufs/chpc.utah.edu/sys/installdir/python/3.5.2-c7/lib/python3.5/site-packages/setuptools-34.3.0-py3.5.egg (from protobuf>=3.2.0->tensorflow-gpu)
Requirement already satisfied: packaging>=16.8 in /uufs/chpc.utah.edu/sys/installdir/python/3.5.2-c7/lib/python3.5/site-packages (from setuptools->protobuf>=3.2.0->tensorflow-gpu)
Requirement already satisfied: appdirs>=1.4.0 in /uufs/chpc.utah.edu/sys/installdir/python/3.5.2-c7/lib/python3.5/site-packages (from setuptools->protobuf>=3.2.0->tensorflow-gpu)
Requirement already satisfied: pyparsing in /uufs/chpc.utah.edu/sys/installdir/python/3.5.2-c7/lib/python3.5/site-packages (from packaging>=16.8->setuptools->protobuf>=3.2.0->tensorflow-gpu)

If you want to use the CPU based version of Tensorflow you should invoke the following command:

(MYTF) [u0253283@kp359 MYTF]$ python3 -m pip install tensorflow

Step 5: Testing of the Tensorflow installation

The newly installed version of Tensorflow can now be tested.

(MYTF) [u0253283@kp359 MYTF]$ python3
Python 3.5.2 (default, Sep 22 2016, 16:29:42) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-4)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf
>>> hello = tf.constant('Hello, TensorFlow!')
>>> sess=tf.Session()
2017-04-25 17:01:49.040676: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-04-25 17:01:49.041249: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-04-25 17:01:49.041278: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-04-25 17:01:49.041301: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-04-25 17:01:49.041338: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-04-25 17:01:49.503890: I tensorflow/core/common_runtime/gpu/gpu_device.cc:887] Found device 0 with properties: 
name: Tesla P100-PCIE-16GB
major: 6 minor: 0 memoryClockRate (GHz) 0.405
pciBusID 0000:04:00.0
Total memory: 15.89GiB
Free memory: 15.61GiB
2017-04-25 17:01:49.504018: I tensorflow/core/common_runtime/gpu/gpu_device.cc:908] DMA: 0 
2017-04-25 17:01:49.504046: I tensorflow/core/common_runtime/gpu/gpu_dev8ice.cc:918] 0: Y 
2017-04-25 17:01:49.504087: I tensorflow/core/common_runtime/gpu/gpu_device.cc:977] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:04:00.0)
>>> print(sess.run(hello))
b'Hello, TensorFlow!'
>>> quit()

If Tensorflow does not load and complains about missing CUDA or CUDNN library,

ImportError: libcublas.so.9.0: cannot open shared object file: No such file or directory

ImportError: libcudnn.so.7.0: cannot open shared object file: No such file or directory

Then refer to the Step 0 - Find prerequisites, to load the appropriate CUDA and/or CUDNN module.

Step 6: Deactivation of your Virtual Environment

To deactivate the Python virtual environment, just type 'deactivate':

(MYTF) [u0253283@kp359 MYTF]$ deactivate
[u0253283@kp359 MYTF]$

The prompt returns immediately to the 'old prompt', i.e. the prompt that was present before you activated the virtual environment.

Appendix (Meta-installation)

Once you have created you Virtual Env, you can get easily step in & walk out of your virtual env.:

Step in your Virtual Env.

module purge # Make sure that no other python distro is active
module load cuda/8.0 cudnn/6.0
cd $HOME/MYTF
source bin/activate

Leaving your Virtual Env

deactivate 
module unload cuda cudnn