TensorFlow 1.x On Ubuntu 16.04 LTS


  • ComputeCpp v0.5.1 or greater (http://developer.codeplay.com)
  • Python 2.7
  • Ubuntu 16.04 LTS
  • AMD R9 Nano / AMD FirePro GPU  ( AMDGPU-PRO driver)


It’s a follow up from the previous post where set up on ubuntu 14.04 was described. The format is going to be the same “copy-paste” and “get-it-working” style.


The assumption is that you have a vanilla  Ubuntu64 16.04.03 LTS installed.


In some cases you will need to install curl

$ sudo apt-get install curl linux-generic

Update 17Jan2018
Note: That might be relevant only to FirePro W8100 users
It seems like old drivers previous to AMDGPU-PRO 17.50.511655 are not able to compile DKMS module for kernels installed by default on Ubuntu 16.04.3 TLS ( 4.13.0-26-generic )

In order to work this around kernel needs to be downgraded to 4.10.0-28-generic

$ sudo apt-get remove linux-image-4.13.0-26-generic linux-headers-4.13.0-26

Java & Bazel

$ sudo apt-get install openjdk-8-jdk
$ echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list
$ curl https://bazel.build/bazel-release.pub.gpg | sudo apt-key add -
$ sudo apt-get update && sudo apt-get install bazel
$ sudo apt-get upgrade bazel


Note: In order to install only OpenCL parts of AMDGPU-PRO driver pass: –compute to  ./amdgpu-pro-install

$ sudo apt-get install ocl-icd-opencl-dev opencl-headers
$ wget --referer=http://support.amd.com https://www2.ati.com/drivers/linux/ubuntu/amdgpu-pro-17.30-465504.tar.xz
$ tar -xvf amdgpu-pro-17.30-465504.tar.xz
$ cd amdgpu-pro-17.30-465504
$ sudo ./amdgpu-pro-install #or sudo ./amdgpu-pro-install --compute
$ tar -xvzf Ubuntu-16.04-64bit.tar.gz
$ sudo mkdir /usr/local/computecpp
$ cd Ubuntu-16.04-64bit && cp * /usr/local/computecpp 
$ sudo reboot

Update: 13Jan2018
In some cases (eg. update of the ubuntu16.04) you might need to use latest AMDGPU-PRO (17.50.511655)

$ wget --referer=http://support.amd.com https://www2.ati.com/drivers/linux/ubuntu/amdgpu-pro-17.50-511655.tar.xz
$ cd amdgpu-pro-17.50-511655
$ ./amdgpu-pro-install --headless --opencl=legacy

Update 17Jan2018
AMDGPU-PRO 17.50.511655 drivers seems to expose either in TensorFlow SYCL implementation or possibly have a bug where graph that nodes are allocated to different devices ( GPU and CPU ) fails to synchronize data from GPU placed node to CPU

Update 11Jul2018
AMDGPU-PRO 17.40-501128 is the latest driver that seems to be working with SYCL 0.9


$ sudo apt-get install python-numpy python-dev python-wheel python-mock python-psutil python-pip
$ sudo pip install --upgrade pip
$ sudo pip install py-cpuinfo portpicker numpy
$ sudo pip install --upgrade scipy


Note: In some cases user need to be added to video group via:

$ sudo adduser $(whoami) video
$ /opt/amdgpu-pro/bin/clinfo

Should return something similar to:

Number of platforms:				 1
  Platform Profile:				 FULL_PROFILE
  Platform Version:				 OpenCL 2.0 AMD-APP (2442.7)
  Platform Name:				 AMD Accelerated Parallel Processing
  Platform Vendor:				 Advanced Micro Devices, Inc.
  Platform Extensions:				 cl_khr_icd cl_amd_event_callback cl_amd_offline_devices 

  Platform Name:				 AMD Accelerated Parallel Processing
Number of devices:				 1
  Device Type:					 CL_DEVICE_TYPE_GPU
  Vendor ID:					 1002h
  Board name:					 AMD Radeon FirePro W8100
 Version:					 OpenCL 1.2 AMD-APP (2442.7)
  Extensions:					 cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event 

There is cl_khr_spir – good sign!


$ /usr/local/computecpp/bin/computecpp_info

For ComputeCpp CE 0.5.0 you should see:


ComputeCpp Info (CE 0.5.0)


Toolchain information:

GLIBC version: 2.23
GLIBCXX: 20160609
This version of libstdc++ is supported.


Device Info:

Discovered 1 devices matching:
  platform    : 
  device type : 

Device 0:

  Device is supported                     : UNTESTED - Vendor not tested on this OS
  CL_DEVICE_NAME                          : Hawaii
  CL_DEVICE_VENDOR                        : Advanced Micro Devices, Inc.
  CL_DRIVER_VERSION                       : 2442.7
  CL_DEVICE_TYPE                          : CL_DEVICE_TYPE_GPU 

If you encounter problems when using any of these OpenCL devices, please consult
this website for known issues:

Set Up

The changesets are being upstreamed however, for now I would recommend using my fork of the TensorFlow.

$ export TF_NEED_OPENCL=1
$ export HOST_CXX_COMPILER=/usr/bin/g++
$ export HOST_C_COMPILER=/usr/bin/gcc
$ export COMPUTECPP_TOOLKIT_PATH=/usr/local/computecpp
$ git clone https://github.com/lukeiwanski/tensorflow.git
$ cd tensorflow
$ git checkout dev/eigen_mehdi
$ ./configure

At this point enter through the config questions.

In order to run tests:

$ bazel test --test_timeout 300,450,1200,3600  -c opt --config=sycl --test_tag_filters=requires-gpu,-no_gpu,-no_oss,-oss_serial,-benchmark-test  -- //tensorflow/... -//tensorflow/compiler/... -//tensorflow/contrib/distributions/... -//tensorflow/contrib/session_bundle/... -//tensorflow/go/... -//tensorflow/stream_executor/... -//tensorflow/core/distributed_runtime/... -//tensorflow/contrib/verbs/... -//tensorflow/contrib/xla_tf_graph/... -//tensorflow/java/... -//tensorflow/core/kernels/hexagon/...

Worth to note at this point there are some fails that we are working on resolving.

Other “worth to mention” is the performance improvement that is still a “work-in-progress”.

$ bazel build -c opt --config=sycl tensorflow/core/kernels:matmul_op_test
$ ./bazel-bin/tensorflow/core/kernels/matmul_op_test --benchmarks=all

On AMD Radeon FirePro W8100 gives:

Further optimisation improvements are in the pipeline.

Textures Added

Screenshot from 2013-07-26 04:33:50

So that was fun. Adding sphere surface coordinates to texture space converter. Used this “trivial” technique UV mapping.

Screenshot from 2013-07-26 04:34:39

Apart from that, I moved the code around and cleaned it slightly. that resulted in few additional frames per second. Each Sphere is semi-transparent with the same texture.

now what, normal mapping ?

[edit: 29 Jul 2013]

Apparently ther was a bug in my texturing. It is gone now 🙂

Screenshot from 2013-07-29 23:57:22

Screenshot from 2013-07-29 23:57:37

Screenshot from 2013-07-30 00:02:08

and night version,

Screenshot from 2013-07-30 00:02:59

Screenshot from 2013-07-30 00:03:25

There is a code that I used.

//intersection point, sphere
//algorithm from http://en.wikipedia.org/wiki/UV_mapping
float3 getTexelID(float3 point, sphere * ss)

float3 pole = (float3)(0.0f,1.0f,0.0f);
float3 equator = (float3)(1.0f,0.0f,0.0f);
float U=0.0f;
float V=0.0f;
float phi = 0.0f;
float theta = 0.0f;
float3 normal = point - ss->pos.xyz;

normal = normalize(normal);
phi = acos( -dot(normal, pole));
V=phi/3.141592653589793 ;

theta = acos( dot(normal, equator)/ sin( phi )) / ( 2 * 3.141592653589793 );
if ( dot((cross(pole, equator)), normal) > 0 )
U = theta;
U = 1 - theta;

float3 x = (float3)((float)V, (float)U, 0.0f);
return x;

3rd Bounce

Screenshot from 2013-07-16 22:58:57

finally I got some spare time to improve this baby.
It is a bit of improvement since the last time – 3rd bounce of rays were hacked into. Now the reflection looks far more realistic.

Screenshot from 2013-07-16 23:03:58

And here reflection of the reflection.. some artefacts here and there – but I can live with them for now.

oh and I need to fix timer for the frame – that is definietly not 0.31ms per frame – it takes about a second per frame. Still not that bad for the vanila implementation.

[Edit: late night]

Screenshot from 2013-07-17 01:38:58

Managed to add semi-transparent spheres! Rendering of the frame became extremely slow – had to reduce number of spheres on the scene.

Is it the right time to introduce optimisations?

How about spheres as voxels?

Screenshot from 2013-07-07 22:06:12
I did a litte bit of research and it kept me wondering.. how about using spheres for voxels representation this time?
Lets try it out and see how much the GPU can take.

Grid of 10^3 Spheres – there is no acceleration architecture implemented yet – Octree for spheres? that feels odd..

Oh, btw. That’s raytraced.. so far quite interactive.

It uses OpenCL on Radeon 5700 HD – only primary rays so far on ambient material. Give it some time.

[Edit: 08-Jul-2013]

Screenshot from 2013-07-08 21:15:59

Simple directional light lighting system implemented.

Screenshot from 2013-07-08 23:05:21

Secondary bounces in place (depth of bounces set to two) – little bit of hackery had to be done since there is problem with recursion in OpenCL.

Screenshot from 2013-07-08 23:09:12

Every third sphere is ambient.