TensorFlow 1.x On Ubuntu 16.04 LTS


  • ComputeCpp v0.5.1 or greater (http://developer.codeplay.com)
  • Python 2.7
  • Ubuntu 16.04 LTS
  • AMD R9 Nano / AMD FirePro GPU  ( AMDGPU-PRO driver)


It’s a follow up from the previous post where set up on ubuntu 14.04 was described. The format is going to be the same “copy-paste” and “get-it-working” style.


The assumption is that you have a vanilla  Ubuntu64 16.04.03 LTS installed.


In some cases you will need to install curl

$ sudo apt-get install curl linux-generic

Update 17Jan2018
Note: That might be relevant only to FirePro W8100 users
It seems like old drivers previous to AMDGPU-PRO 17.50.511655 are not able to compile DKMS module for kernels installed by default on Ubuntu 16.04.3 TLS ( 4.13.0-26-generic )

In order to work this around kernel needs to be downgraded to 4.10.0-28-generic

$ sudo apt-get remove linux-image-4.13.0-26-generic linux-headers-4.13.0-26

Java & Bazel

$ sudo apt-get install openjdk-8-jdk
$ echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list
$ curl https://bazel.build/bazel-release.pub.gpg | sudo apt-key add -
$ sudo apt-get update && sudo apt-get install bazel
$ sudo apt-get upgrade bazel


Note: In order to install only OpenCL parts of AMDGPU-PRO driver pass: –compute to  ./amdgpu-pro-install

$ sudo apt-get install ocl-icd-opencl-dev opencl-headers
$ wget --referer=http://support.amd.com https://www2.ati.com/drivers/linux/ubuntu/amdgpu-pro-17.30-465504.tar.xz
$ tar -xvf amdgpu-pro-17.30-465504.tar.xz
$ cd amdgpu-pro-17.30-465504
$ sudo ./amdgpu-pro-install #or sudo ./amdgpu-pro-install --compute
$ tar -xvzf Ubuntu-16.04-64bit.tar.gz
$ sudo mkdir /usr/local/computecpp
$ cd Ubuntu-16.04-64bit && cp * /usr/local/computecpp 
$ sudo reboot

Update: 13Jan2018
In some cases (eg. update of the ubuntu16.04) you might need to use latest AMDGPU-PRO (17.50.511655)

$ wget --referer=http://support.amd.com https://www2.ati.com/drivers/linux/ubuntu/amdgpu-pro-17.50-511655.tar.xz
$ cd amdgpu-pro-17.50-511655
$ ./amdgpu-pro-install --headless --opencl=legacy

Update 17Jan2018
AMDGPU-PRO 17.50.511655 drivers seems to expose either in TensorFlow SYCL implementation or possibly have a bug where graph that nodes are allocated to different devices ( GPU and CPU ) fails to synchronize data from GPU placed node to CPU

Update 11Jul2018
AMDGPU-PRO 17.40-501128 is the latest driver that seems to be working with SYCL 0.9


$ sudo apt-get install python-numpy python-dev python-wheel python-mock python-psutil python-pip
$ sudo pip install --upgrade pip
$ sudo pip install py-cpuinfo portpicker numpy
$ sudo pip install --upgrade scipy


Note: In some cases user need to be added to video group via:

$ sudo adduser $(whoami) video
$ /opt/amdgpu-pro/bin/clinfo

Should return something similar to:

Number of platforms:				 1
  Platform Profile:				 FULL_PROFILE
  Platform Version:				 OpenCL 2.0 AMD-APP (2442.7)
  Platform Name:				 AMD Accelerated Parallel Processing
  Platform Vendor:				 Advanced Micro Devices, Inc.
  Platform Extensions:				 cl_khr_icd cl_amd_event_callback cl_amd_offline_devices 

  Platform Name:				 AMD Accelerated Parallel Processing
Number of devices:				 1
  Device Type:					 CL_DEVICE_TYPE_GPU
  Vendor ID:					 1002h
  Board name:					 AMD Radeon FirePro W8100
 Version:					 OpenCL 1.2 AMD-APP (2442.7)
  Extensions:					 cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event 

There is cl_khr_spir – good sign!


$ /usr/local/computecpp/bin/computecpp_info

For ComputeCpp CE 0.5.0 you should see:


ComputeCpp Info (CE 0.5.0)


Toolchain information:

GLIBC version: 2.23
GLIBCXX: 20160609
This version of libstdc++ is supported.


Device Info:

Discovered 1 devices matching:
  platform    : 
  device type : 

Device 0:

  Device is supported                     : UNTESTED - Vendor not tested on this OS
  CL_DEVICE_NAME                          : Hawaii
  CL_DEVICE_VENDOR                        : Advanced Micro Devices, Inc.
  CL_DRIVER_VERSION                       : 2442.7
  CL_DEVICE_TYPE                          : CL_DEVICE_TYPE_GPU 

If you encounter problems when using any of these OpenCL devices, please consult
this website for known issues:

Set Up

The changesets are being upstreamed however, for now I would recommend using my fork of the TensorFlow.

$ export TF_NEED_OPENCL=1
$ export HOST_CXX_COMPILER=/usr/bin/g++
$ export HOST_C_COMPILER=/usr/bin/gcc
$ export COMPUTECPP_TOOLKIT_PATH=/usr/local/computecpp
$ git clone https://github.com/lukeiwanski/tensorflow.git
$ cd tensorflow
$ git checkout dev/eigen_mehdi
$ ./configure

At this point enter through the config questions.

In order to run tests:

$ bazel test --test_timeout 300,450,1200,3600  -c opt --config=sycl --test_tag_filters=requires-gpu,-no_gpu,-no_oss,-oss_serial,-benchmark-test  -- //tensorflow/... -//tensorflow/compiler/... -//tensorflow/contrib/distributions/... -//tensorflow/contrib/session_bundle/... -//tensorflow/go/... -//tensorflow/stream_executor/... -//tensorflow/core/distributed_runtime/... -//tensorflow/contrib/verbs/... -//tensorflow/contrib/xla_tf_graph/... -//tensorflow/java/... -//tensorflow/core/kernels/hexagon/...

Worth to note at this point there are some fails that we are working on resolving.

Other “worth to mention” is the performance improvement that is still a “work-in-progress”.

$ bazel build -c opt --config=sycl tensorflow/core/kernels:matmul_op_test
$ ./bazel-bin/tensorflow/core/kernels/matmul_op_test --benchmarks=all

On AMD Radeon FirePro W8100 gives:

Further optimisation improvements are in the pipeline.

Published by

Luke Iwanski

Senior Graphics Programmer @ CD Projekt RED

15 thoughts on “TensorFlow 1.x On Ubuntu 16.04 LTS”

    1. Nvidia – not at present.
      Intel CPU/GPU – as long as driver reports cl_khr_spir extension you should be OK.

      As of the steps nothing should change since TF nodes are registered to the general SYCLDevice. At runtime there is very simple decision performed on what SYCLDevice is on your system: https://github.com/lukeiwanski/tensorflow/blob/master/tensorflow/core/common_runtime/sycl/sycl_device.h#L36 (GPU > CPU).
      So in other word if your system has both CPU and GPU that are SYCL capable GPU will be selected.

      Hope that helps.

  1. Apologies in advance for such a general question, but:

    As part of this optimization effort, are there any plans to add half-precision floats to Tensorflow for training? Many folks are itching to take advantage of Vega’s 25 TFLOPS fp16 performance for their models.

  2. I am getting “python: can’t open file ‘external/local_config_sycl/crosstool/computecpp’: [Errno 2] No such file or directory” from today’s clone on the repo following your steps when I do a bazel test.

    I am not familiar with bazel but it seems the rule that should create the directory local_config_sycl/crosstool/computecpp is not running. Any idea how I can fix that ?

    I also submitted a github ticket

  3. My AMD GPU model is R9 M265X with AMDGPU-PRO 17.30 driver and I am currently using ubuntu 16.04.
    The bazel test takes so much of time since it has been 6 hours from start. Is this because of my GPU model?

    1. I will need a bit more information – could you create a GitHub issue for that? Could you provide there your system details, computecpp_info output and log for this command:
      bazel test -c opt --config=sycl --test_output=all //tensorflow/python/kernel_tests:basic_gpu_test?

      The log should show something like this:
      ==================== Test output for //tensorflow/python/kernel_tests:basic_gpu_test:
      2017-10-05 10:53:52.727745: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA
      2017-10-05 10:53:53.059908: I ./tensorflow/core/common_runtime/sycl/sycl_device.h:66] Found following OpenCL devices:
      2017-10-05 10:53:53.059926: I ./tensorflow/core/common_runtime/sycl/sycl_device.h:68] id: 0, type: GPU, name: Tonga, vendor: Advanced Micro Devices, Inc., profile: FULL_PROFILE

      and we will take it from there at github 🙂

      1. Yes sure. But if possible, can you say me what will be the average build time in most cases, even approximate one is fine. Thank you so much.

  4. Thank you for your support, build completed. In my case it showed,
    Executed 297 out of 297 tests: 46 tests pass and 251 fail locally.

  5. I have two GPU models. Is this a reason for my unsolved problem?

    gautham@gautham-dell:~/computecpp-sdk/build$ lspci | grep VGA
    00:02.0 VGA compatible controller: Intel Corporation Haswell-ULT Integrated Graphics Controller (rev 09)
    03:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Venus PRO [Radeon HD 8850M / R9 M265X] (rev ff)

  6. Thanks for your efforts. Worked perfectly on my machine. Here are some results with an i7 4790k and a R9 Fury Air (sorry about formatting):

    1_512_512_false_false 13369.3M 3552.7M
    8_512_512_false_false 42507.8M 27560.7M
    16_512_512_false_false 63049.1M 53599.1M
    128_512_512_false_false 105552.4M 230505.1M
    1_1024_1024_false_false 16167.3M 8712.4M
    8_1024_1024_false_false 45071.7M 72885.2M
    16_1024_1024_false_false 62143.8M 137604.9M
    128_1024_1024_false_false 107922.2M 611910.6M
    4096_4096_4096_false_false 122394.1M 2210621.8M
    20_200_10000_false_false 46349.4M 254542.1M
    20_200_20000_false_false 47407.8M 299579.4M

Leave a Reply

Your email address will not be published. Required fields are marked *