#

TensorFlow on Odyssey

Introduction

This page is intended to help you access or setup TensorFlow on Odyssey Cluster.  

GPU support 

At time of writing the latest release of TensorFlow is 1.9.0.  Tensorflow 1.9  requires the runtime cuda 9.0 and the cudnn version 7.0. 

Also, at the time of writing the latest version available as wheel with pip is for python3.6. Support for Python 3.7 was not yet added.

The current CUDA runtime for GPU-enabled nodes on the cluster  is 9.2. So TF 1.9 should work on all GPU nodes. Please refer to our documentation on how to submit run GPU jobs  on the cluster.

The two recommended solutions for setting up TensorFlow are to install the latest version in a python conda environment inside your user folder, or run Tensorflow as singularity container.

Install in a conda environment 

At time of writing the latest release of TensorFlow is 1.8.0, and it requires the runtime cuda 9.0 and the cudnn version 7.0 . You can install your own version following these simple steps.  

#1. load Anaconda, cuda and cudnn 
>$ module load Anaconda3/5.0.1-fasrc01
>$ module load cuda/9.0-fasrc02 cudnn/7.0_cuda9.0-fasrc01

#2. create a new environment with the latest python3 and some dependencies needed by TensorFlow 
>$ conda create -n tf1.9_cuda9 python=3.6 numpy six wheel

#3. activate the conda environment
>$ source activate tf1.9_cuda9

#4. use pip to install tensorflow
(tf1.9_cuda9)>$ pip install --upgrade tensorflow-gpu 

Please note that while you can run the installation on the login nodes,  you will not be able to use the software on the login nodes  as there is no GPU on the login servers.

Running Tensorflow as a singularity container 

Alternatively, you can run TensorFlow as a  singularity container.

>$ singularity exec  --nv docker://tensorflow/tensorflow:latest-gpu python myCNN.py

 

CPU version 

TensorFlow optimized for Intel Hardware (only available for partitions "shared" and "test") 

If you would like to work with the CPU version of Tensorflow, you should try to use the version provided by Intel . Please note that this version is only working on Intel hardware, so you will be able to run on partitions "shared", "test" or any other Intel based priority partition  your lab might have access to. 

Please note that the code will not run in the "general" partition or on the login nodes, as those servers feature AMD processors.

At time of writing the latest version of TensorFlow released as wheel by Intel is  1.6.0.

You can install your own version following these simple steps.  

Install in a conda environment (valid both for CentOS6 and CentOS7)

#1. load Anaconda
>$ module load Anaconda3/5.0.1-fasrc01

#2. create a new environment with the latest python3 and some dependencies needed by TensorFlow from the Intel channel
>$conda create -n tf1.6_intel -c intel python=3.6 pip numpy

#3. activate the conda environment
>$ source activate tf1.6_intel

#4. use pip to install tensorflow
(tf1.6_intel)>$pip install --no-cache-dir https://anaconda.org/intel/tensorflow/1.6.0/download/tensorflow-1.6.0-cp36-cp36m-linux_x86_64.whl

... Discussion on optimization considerations coming soon...

CC BY-NC 4.0 This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Permissions beyond the scope of this license may be available at Attribution.