#

TensorFlow on Odyssey

Introduction

This page is intended to help you access or setup TensorFlow on Odyssey Cluster.  

GPU support 

The latest software modules for TensorFlow with GPU support currently available on the cluster are:

  • Python2:
    >$ module load gcc/4.8.2-fasrc01 cuda/7.5-fasrc02 tensorflow/1.3.0-fasrc02
  • Python3:
    >$ module load gcc/4.8.2-fasrc01 cuda/7.5-fasrc02 tensorflow/1.3.0-fasrc01

Those two versions should work across the cluster on all gpu nodes.
Newer versions of Tensor Flow  require cuda-8 or cuda-9 (depending on the specific version), and  unfortunately we cannot currently provide a general module valid for all the cluster as the cuda driver installations are heterogeneous across the different GPU partitions.

Version of cuda drivers will be made homogeneous once the cluster gets upgraded to CentOS7   and all GPU nodes will run CentOS7 and the latest available cuda driver.

For Partitions already running CentOS7 (for example seas_dgx1) :

If the GPU partition your lab has access to is already on Centos7 and cuda9 , you can install the latest version  in a python conda environment inside your user folder or run Tensorflow as singularity container.

Install in a conda environment (only valid for CentOS7)

At time of writing the latest release of TensorFlow is 1.7.0, and it requires the runtime cuda 9.0 and the cudnn version 7.0 . You can install your own version following these simple steps.  

#1. load Anaconda, cuda and cudnn 
>$ module load Anaconda3/5.0.1-fasrc01
>$ module load cuda/9.0-fasrc02 cudnn/7.0_cuda9.0-fasrc01

#2. create a new environment with the latest python3 and some dependencies needed by TensorFlow 
>$ conda create -n tf1.7_cuda9 python=3 numpy six wheel

#3. activate the conda environment
>$ source activate tf1.7_cuda9

#4. use pip to install tensorflow
(tf1.7_cuda9)>$ pip install --upgrade tensorflow-gpu 

Please note that while you can run the installation on the CentOS7 login nodes (login7.rc.fas.harvard.edu) you will not be able to use the software on the login nodes  as there is no GPU on the login servers.

Running Tensorflow as a singularity container (only valid for CentOS7)

Alternatively, you can run TensorFlow as a  singularity container.

>$ singularity exec  --nv docker://tensorflow/tensorflow:latest-gpu python myCNN.py

 

CPU version 

TensorFlow optimized for Intel Hardware (only available for partitions "shared" and "test") 

If you would like to experiment with the CPU version of Tensorflow, you should try to use the version provided by Intel . Please note that this version is only working on Intel hardware, so you will be able to run on partitions "shared", "test" or any other Intel based priority partition  your lab might have access to. This version works both on CentOS6 and CentOS7.

Please note that the code will not run in the "general" partition or on the login nodes, as those servers feature AMD processors.

At time of writing the latest version of TensorFlow released as wheel by Intel is  1.6.0.

You can install your own version following these simple steps.  

Install in a conda environment (valid both for CentOS6 and CentOS7)

#1. load Anaconda
>$ module load Anaconda3/5.0.1-fasrc01

#2. create a new environment with the latest python3 and some dependencies needed by TensorFlow from the Intel channel
>$conda create -n tf1.6_intel -c intel python=3 pip numpy

#3. activate the conda environment
>$ source activate tf1.6_intel

#4. use pip to install tensorflow
(tf1.6_intel)>$pip install --no-cache-dir https://anaconda.org/intel/tensorflow/1.6.0/download/tensorflow-1.6.0-cp36-cp36m-linux_x86_64.whl

... Discussion on optimization considerations coming soon...

CC BY-NC 4.0 This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License. Permissions beyond the scope of this license may be available at Attribution.