TensorFlow 1.x On Ubuntu 16.04 LTS


  • ComputeCpp v0.5.1 or greater (http://developer.codeplay.com)
  • Python 2.7
  • Ubuntu 16.04 LTS
  • AMD R9 Nano / AMD FirePro GPU  ( AMDGPU-PRO driver)


It’s a follow up from the previous post where set up on ubuntu 14.04 was described. The format is going to be the same “copy-paste” and “get-it-working” style.


The assumption is that you have a vanilla  Ubuntu64 16.04.03 LTS installed.


In some cases you will need to install curl

$ sudo apt-get install curl linux-generic

Update 17Jan2018
Note: That might be relevant only to FirePro W8100 users
It seems like old drivers previous to AMDGPU-PRO 17.50.511655 are not able to compile DKMS module for kernels installed by default on Ubuntu 16.04.3 TLS ( 4.13.0-26-generic )

In order to work this around kernel needs to be downgraded to 4.10.0-28-generic

$ sudo apt-get remove linux-image-4.13.0-26-generic linux-headers-4.13.0-26

Java & Bazel

$ sudo apt-get install openjdk-8-jdk
$ echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list
$ curl https://bazel.build/bazel-release.pub.gpg | sudo apt-key add -
$ sudo apt-get update && sudo apt-get install bazel
$ sudo apt-get upgrade bazel


Note: In order to install only OpenCL parts of AMDGPU-PRO driver pass: –compute to  ./amdgpu-pro-install

$ sudo apt-get install ocl-icd-opencl-dev opencl-headers
$ wget --referer=http://support.amd.com https://www2.ati.com/drivers/linux/ubuntu/amdgpu-pro-17.30-465504.tar.xz
$ tar -xvf amdgpu-pro-17.30-465504.tar.xz
$ cd amdgpu-pro-17.30-465504
$ sudo ./amdgpu-pro-install #or sudo ./amdgpu-pro-install --compute
$ tar -xvzf Ubuntu-16.04-64bit.tar.gz
$ sudo mkdir /usr/local/computecpp
$ cd Ubuntu-16.04-64bit && cp * /usr/local/computecpp 
$ sudo reboot

Update: 13Jan2018
In some cases (eg. update of the ubuntu16.04) you might need to use latest AMDGPU-PRO (17.50.511655)

$ wget --referer=http://support.amd.com https://www2.ati.com/drivers/linux/ubuntu/amdgpu-pro-17.50-511655.tar.xz
$ cd amdgpu-pro-17.50-511655
$ ./amdgpu-pro-install --headless --opencl=legacy

Update 17Jan2018
AMDGPU-PRO 17.50.511655 drivers seems to expose either in TensorFlow SYCL implementation or possibly have a bug where graph that nodes are allocated to different devices ( GPU and CPU ) fails to synchronize data from GPU placed node to CPU

Update 11Jul2018
AMDGPU-PRO 17.40-501128 is the latest driver that seems to be working with SYCL 0.9


$ sudo apt-get install python-numpy python-dev python-wheel python-mock python-psutil python-pip
$ sudo pip install --upgrade pip
$ sudo pip install py-cpuinfo portpicker numpy
$ sudo pip install --upgrade scipy


Note: In some cases user need to be added to video group via:

$ sudo adduser $(whoami) video
$ /opt/amdgpu-pro/bin/clinfo

Should return something similar to:

Number of platforms:				 1
  Platform Profile:				 FULL_PROFILE
  Platform Version:				 OpenCL 2.0 AMD-APP (2442.7)
  Platform Name:				 AMD Accelerated Parallel Processing
  Platform Vendor:				 Advanced Micro Devices, Inc.
  Platform Extensions:				 cl_khr_icd cl_amd_event_callback cl_amd_offline_devices 

  Platform Name:				 AMD Accelerated Parallel Processing
Number of devices:				 1
  Device Type:					 CL_DEVICE_TYPE_GPU
  Vendor ID:					 1002h
  Board name:					 AMD Radeon FirePro W8100
 Version:					 OpenCL 1.2 AMD-APP (2442.7)
  Extensions:					 cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event 

There is cl_khr_spir – good sign!


$ /usr/local/computecpp/bin/computecpp_info

For ComputeCpp CE 0.5.0 you should see:


ComputeCpp Info (CE 0.5.0)


Toolchain information:

GLIBC version: 2.23
GLIBCXX: 20160609
This version of libstdc++ is supported.


Device Info:

Discovered 1 devices matching:
  platform    : 
  device type : 

Device 0:

  Device is supported                     : UNTESTED - Vendor not tested on this OS
  CL_DEVICE_NAME                          : Hawaii
  CL_DEVICE_VENDOR                        : Advanced Micro Devices, Inc.
  CL_DRIVER_VERSION                       : 2442.7
  CL_DEVICE_TYPE                          : CL_DEVICE_TYPE_GPU 

If you encounter problems when using any of these OpenCL devices, please consult
this website for known issues:

Set Up

The changesets are being upstreamed however, for now I would recommend using my fork of the TensorFlow.

$ export TF_NEED_OPENCL=1
$ export HOST_CXX_COMPILER=/usr/bin/g++
$ export HOST_C_COMPILER=/usr/bin/gcc
$ export COMPUTECPP_TOOLKIT_PATH=/usr/local/computecpp
$ git clone https://github.com/lukeiwanski/tensorflow.git
$ cd tensorflow
$ git checkout dev/eigen_mehdi
$ ./configure

At this point enter through the config questions.

In order to run tests:

$ bazel test --test_timeout 300,450,1200,3600  -c opt --config=sycl --test_tag_filters=requires-gpu,-no_gpu,-no_oss,-oss_serial,-benchmark-test  -- //tensorflow/... -//tensorflow/compiler/... -//tensorflow/contrib/distributions/... -//tensorflow/contrib/session_bundle/... -//tensorflow/go/... -//tensorflow/stream_executor/... -//tensorflow/core/distributed_runtime/... -//tensorflow/contrib/verbs/... -//tensorflow/contrib/xla_tf_graph/... -//tensorflow/java/... -//tensorflow/core/kernels/hexagon/...

Worth to note at this point there are some fails that we are working on resolving.

Other “worth to mention” is the performance improvement that is still a “work-in-progress”.

$ bazel build -c opt --config=sycl tensorflow/core/kernels:matmul_op_test
$ ./bazel-bin/tensorflow/core/kernels/matmul_op_test --benchmarks=all

On AMD Radeon FirePro W8100 gives:

Further optimisation improvements are in the pipeline.

Setting Up TensorFlow With OpenCL Using SYCL

New version of the set-up instructions for TensorFlow SYCL here.



This short post aims to guide through set-up process for TensorFlow with OpenCL support. It’s “copy-paste” type of post. There is nothing groundbreaking in it, all the instructions can be found across other websites / forums.

The aim was to put all relevant information in one place which should make it more convenient/easier to go through the set-up process.
Continue reading Setting Up TensorFlow With OpenCL Using SYCL

Global Game Jam 2014


On the weekend of 24-26 Jan Global Game Jam has been hosted at Edinburgh Napier University. 48hour to make a game in *cough* theme *cough* which was:

“We don’t see things as they are,
we see them as we are.”

Ideal theme for horror game, right? right?!

Anyway, I wanted to play bit more with OculusVR and make horror game. Aparently there was more people like me:

Thanks to Floyd Chitalu (Code), Joanna Jamrozy && Malgorzata Kosek (Art)
you guys are awesome!

After 48hours and countles reboots of the machines at the location. Yeah PCs are being wiped out every night – few times.
But, hey there is warning! About 3mins before shutdown.. Thumbs up!
so sit down and relax, because you will not be able to backup your project in time anyway.

The game after all is quite scary, buggy but scarry! – I jumped few times playing it.


More about the project, plus binary ( Windows ) && Video from hardcore testing: http://globalgamejam.org/2014/games/buka


Main Menu

The weekend of 16th-17th Nov my friend and colleague @GordonBrown589 and myself proudly represented Codeplay Software at GameHack – idea was to create a simple and fun game within theme in under 24hrs.

The theme was “childhood”.. well.. we still created fun game using awesome technology that is OculusVR.

There are some of the screen shots.. Game looks much better with VR set!

Fullscreen capture 24112013 192813

Fullscreen capture 24112013 192849

Fullscreen capture 24112013 192836

Fullscreen capture 24112013 192818

Textures Added

Screenshot from 2013-07-26 04:33:50

So that was fun. Adding sphere surface coordinates to texture space converter. Used this “trivial” technique UV mapping.

Screenshot from 2013-07-26 04:34:39

Apart from that, I moved the code around and cleaned it slightly. that resulted in few additional frames per second. Each Sphere is semi-transparent with the same texture.

now what, normal mapping ?

[edit: 29 Jul 2013]

Apparently ther was a bug in my texturing. It is gone now 🙂

Screenshot from 2013-07-29 23:57:22

Screenshot from 2013-07-29 23:57:37

Screenshot from 2013-07-30 00:02:08

and night version,

Screenshot from 2013-07-30 00:02:59

Screenshot from 2013-07-30 00:03:25

There is a code that I used.

//intersection point, sphere
//algorithm from http://en.wikipedia.org/wiki/UV_mapping
float3 getTexelID(float3 point, sphere * ss)

float3 pole = (float3)(0.0f,1.0f,0.0f);
float3 equator = (float3)(1.0f,0.0f,0.0f);
float U=0.0f;
float V=0.0f;
float phi = 0.0f;
float theta = 0.0f;
float3 normal = point - ss->pos.xyz;

normal = normalize(normal);
phi = acos( -dot(normal, pole));
V=phi/3.141592653589793 ;

theta = acos( dot(normal, equator)/ sin( phi )) / ( 2 * 3.141592653589793 );
if ( dot((cross(pole, equator)), normal) > 0 )
U = theta;
U = 1 - theta;

float3 x = (float3)((float)V, (float)U, 0.0f);
return x;

3rd Bounce

Screenshot from 2013-07-16 22:58:57

finally I got some spare time to improve this baby.
It is a bit of improvement since the last time – 3rd bounce of rays were hacked into. Now the reflection looks far more realistic.

Screenshot from 2013-07-16 23:03:58

And here reflection of the reflection.. some artefacts here and there – but I can live with them for now.

oh and I need to fix timer for the frame – that is definietly not 0.31ms per frame – it takes about a second per frame. Still not that bad for the vanila implementation.

[Edit: late night]

Screenshot from 2013-07-17 01:38:58

Managed to add semi-transparent spheres! Rendering of the frame became extremely slow – had to reduce number of spheres on the scene.

Is it the right time to introduce optimisations?