OpenCL – Deep Beta

Requirements

ComputeCpp v0.5.1 or greater (http://developer.codeplay.com)
Python 2.7
Ubuntu 16.04 LTS
AMD R9 Nano / AMD FirePro GPU ( AMDGPU-PRO driver)

Intro

It’s a follow up from the previous post where set up on ubuntu 14.04 was described. The format is going to be the same “copy-paste” and “get-it-working” style.

Dependencies

The assumption is that you have a vanilla Ubuntu64 16.04.03 LTS installed.

Misc

In some cases you will need to install curl

$ sudo apt-get install curl linux-generic

Update 17Jan2018
Note: That might be relevant only to FirePro W8100 users
It seems like old drivers previous to AMDGPU-PRO 17.50.511655 are not able to compile DKMS module for kernels installed by default on Ubuntu 16.04.3 TLS ( 4.13.0-26-generic )

In order to work this around kernel needs to be downgraded to 4.10.0-28-generic

$ sudo apt-get remove linux-image-4.13.0-26-generic linux-headers-4.13.0-26

Java & Bazel

$ sudo apt-get install openjdk-8-jdk
$ echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list
$ curl https://bazel.build/bazel-release.pub.gpg | sudo apt-key add -
$ sudo apt-get update && sudo apt-get install bazel
$ sudo apt-get upgrade bazel

OpenCL & SYCL

Note: In order to install only OpenCL parts of AMDGPU-PRO driver pass: –compute to ./amdgpu-pro-install

$ sudo apt-get install ocl-icd-opencl-dev opencl-headers
$ wget --referer=http://support.amd.com https://www2.ati.com/drivers/linux/ubuntu/amdgpu-pro-17.30-465504.tar.xz
$ tar -xvf amdgpu-pro-17.30-465504.tar.xz
$ cd amdgpu-pro-17.30-465504
$ sudo ./amdgpu-pro-install #or sudo ./amdgpu-pro-install --compute
$ tar -xvzf Ubuntu-16.04-64bit.tar.gz
$ sudo mkdir /usr/local/computecpp
$ cd Ubuntu-16.04-64bit && cp * /usr/local/computecpp 
$ sudo reboot

Update: 13Jan2018
In some cases (eg. update of the ubuntu16.04) you might need to use latest AMDGPU-PRO (17.50.511655)

$ wget --referer=http://support.amd.com https://www2.ati.com/drivers/linux/ubuntu/amdgpu-pro-17.50-511655.tar.xz
$ cd amdgpu-pro-17.50-511655
$ ./amdgpu-pro-install --headless --opencl=legacy

Update 17Jan2018
AMDGPU-PRO 17.50.511655 drivers seems to expose either in TensorFlow SYCL implementation or possibly have a bug where graph that nodes are allocated to different devices ( GPU and CPU ) fails to synchronize data from GPU placed node to CPU

Update 11Jul2018
AMDGPU-PRO 17.40-501128 is the latest driver that seems to be working with SYCL 0.9

Python

$ sudo apt-get install python-numpy python-dev python-wheel python-mock python-psutil python-pip
$ sudo pip install --upgrade pip
$ sudo pip install py-cpuinfo portpicker numpy
$ sudo pip install --upgrade scipy

Verification

Note: In some cases user need to be added to video group via:

$ sudo adduser $(whoami) video

$ /opt/amdgpu-pro/bin/clinfo

Should return something similar to:

Number of platforms:				 1
  Platform Profile:				 FULL_PROFILE
  Platform Version:				 OpenCL 2.0 AMD-APP (2442.7)
  Platform Name:				 AMD Accelerated Parallel Processing
  Platform Vendor:				 Advanced Micro Devices, Inc.
  Platform Extensions:				 cl_khr_icd cl_amd_event_callback cl_amd_offline_devices 


  Platform Name:				 AMD Accelerated Parallel Processing
Number of devices:				 1
  Device Type:					 CL_DEVICE_TYPE_GPU
  Vendor ID:					 1002h
  Board name:					 AMD Radeon FirePro W8100
...
 Version:					 OpenCL 1.2 AMD-APP (2442.7)
  Extensions:					 cl_khr_fp64 cl_amd_fp64 cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_khr_int64_base_atomics cl_khr_int64_extended_atomics cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_gl_sharing cl_amd_device_attribute_query cl_amd_vec3 cl_amd_printf cl_amd_media_ops cl_amd_media_ops2 cl_amd_popcnt cl_khr_image2d_from_buffer cl_khr_spir cl_khr_gl_event

There is cl_khr_spir – good sign!

Now:

$ /usr/local/computecpp/bin/computecpp_info

For ComputeCpp CE 0.5.0 you should see:

********************************************************************************

ComputeCpp Info (CE 0.5.0)

********************************************************************************

Toolchain information:

GLIBC version: 2.23
GLIBCXX: 20160609
This version of libstdc++ is supported.

********************************************************************************


Device Info:

Discovered 1 devices matching:
  platform    : 
  device type : 

--------------------------------------------------------------------------------
Device 0:

  Device is supported                     : UNTESTED - Vendor not tested on this OS
  CL_DEVICE_NAME                          : Hawaii
  CL_DEVICE_VENDOR                        : Advanced Micro Devices, Inc.
  CL_DRIVER_VERSION                       : 2442.7
  CL_DEVICE_TYPE                          : CL_DEVICE_TYPE_GPU 

If you encounter problems when using any of these OpenCL devices, please consult
this website for known issues:
https://computecpp.codeplay.com/releases/v0.3.1/platform-support-notes

Set Up

The changesets are being upstreamed however, for now I would recommend using my fork of the TensorFlow.

$ export TF_NEED_OPENCL=1
$ export HOST_CXX_COMPILER=/usr/bin/g++
$ export HOST_C_COMPILER=/usr/bin/gcc
$ export COMPUTECPP_TOOLKIT_PATH=/usr/local/computecpp
$ git clone https://github.com/lukeiwanski/tensorflow.git
$ cd tensorflow
$ git checkout dev/eigen_mehdi
$ ./configure

At this point enter through the config questions.

In order to run tests:

$ bazel test --test_timeout 300,450,1200,3600  -c opt --config=sycl --test_tag_filters=requires-gpu,-no_gpu,-no_oss,-oss_serial,-benchmark-test  -- //tensorflow/... -//tensorflow/compiler/... -//tensorflow/contrib/distributions/... -//tensorflow/contrib/session_bundle/... -//tensorflow/go/... -//tensorflow/stream_executor/... -//tensorflow/core/distributed_runtime/... -//tensorflow/contrib/verbs/... -//tensorflow/contrib/xla_tf_graph/... -//tensorflow/java/... -//tensorflow/core/kernels/hexagon/...

Worth to note at this point there are some fails that we are working on resolving.

Other “worth to mention” is the performance improvement that is still a “work-in-progress”.

$ bazel build -c opt --config=sycl tensorflow/core/kernels:matmul_op_test
$ ./bazel-bin/tensorflow/core/kernels/matmul_op_test --benchmarks=all

On AMD Radeon FirePro W8100 gives:

Further optimisation improvements are in the pipeline.

So that was fun. Adding sphere surface coordinates to texture space converter. Used this “trivial” technique UV mapping.

Apart from that, I moved the code around and cleaned it slightly. that resulted in few additional frames per second. Each Sphere is semi-transparent with the same texture.

now what, normal mapping ?

[edit: 29 Jul 2013]

Apparently ther was a bug in my texturing. It is gone now 🙂

and night version,

There is a code that I used.

//intersection point, sphere
//algorithm from http://en.wikipedia.org/wiki/UV_mapping
float3 getTexelID(float3 point, sphere * ss)
{

float3 pole = (float3)(0.0f,1.0f,0.0f);
float3 equator = (float3)(1.0f,0.0f,0.0f);
float U=0.0f;
float V=0.0f;
float phi = 0.0f;
float theta = 0.0f;
float3 normal = point - ss->pos.xyz;

normal = normalize(normal);
phi = acos( -dot(normal, pole));
V=phi/3.141592653589793 ;

theta = acos( dot(normal, equator)/ sin( phi )) / ( 2 * 3.141592653589793 );
if ( dot((cross(pole, equator)), normal) > 0 )
U = theta;
else
U = 1 - theta;

float3 x = (float3)((float)V, (float)U, 0.0f);
return x;
}

Category: OpenCL

TensorFlow 1.x On Ubuntu 16.04 LTS