Using OffloadCL to compile C++ AMP code for OpenCL

C++ AMP (Accelerated Massive Parallelism) is a GPGPU API (STL-like library) implemented by Microsoft in c++11.

Lately, I had a pleasure to use an alternative tool-kit that is not limited to DirectX.
Technology that can be used on any device – that can run OpenCL – which not only reaches the performance offered by the C++ AMP but even goes beyond it – in some cases.

More information about the OffloadCL tool-kit – note that the OffloadCL tool-kit does not implement C++ AMP, it’s the flexible design allows the possibility to make it work C++ AMP code; see example below.

The showcase I want to present in this post is a Binomial Option Pricing Model (BOPM) – my objective was to “port” the code from this blog post so it will be using OffloadCL tool-kit.

More information about BOPM.

After few hours of setting up OffloadCL, few “why is that not working?” later I was ready to start my first application using OffloadCL tool-kit.. Well almost..

I had no idea what are the methods and approaches used in the tool-kit – there is no official documentation – yet – however there was a couple of ready examples and header file, for the rescue!

Few minutes later I was good to go.

When I started reading the C++ AMP source code, was pretty usual – small functions that do the job, few #define etc etc.

// Sequential(CPU) binomial option calculation
void binomial_options_cpu()
  const unsigned data_size = MAX_OPTIONS;

    // this is like GPU kernel - where we have the meat
    for (unsigned i = 0; i < data_size; i++)
        float call[NUM_STEPS + 1];

        // Compute values at expiration date:
        // call option value at period end is V(T) = S(T) - X
        // if S(T) is greater than X, or zero otherwise.
        // The computation is similar for put options.
        for(int j = 0; j <= NUM_STEPS; j++)
           call[j] = expiry_call_value(V_S[i], V_X[i], V_VDT[i], j);

        // Walk backwards up binomial tree
        for(int j = NUM_STEPS; j > 0; j--)
            for(int k = 0; k < = j - 1; k++)
                call[k] = V_PU_BY_DF[i] * call[k + 1] + V_PD_BY_DF[i] * call[k];

        CALL_VALUE_CPU[i] = call[0];

Code from BinomialOptions.cpp

Code above is CPU version of binomial calculation, fairly simple.

However GPU version of this function is divided in two.

The method that prepares all buffers, in parallel_for_each function kernel method does all the calculations, for source code please download from here.

This is where things got complicated.. I had no big experience with parallel programming so I thought that I will need to spend a lot of time on research and learning, but apparently OffloadCL tool-kit made my work very easy..

What I had to do was to integrate the Offload compiler to the solution for the latest VS11 - there will be proper integration in the future.

Next step was to write some code in that file, small modification of the c++ code was enough to make the example work.

Get the source files for this post.

The OffloadCL tool-kit is still being improved, however I like the way how it works at the moment, the code is C like, simple and efficient.
I hope it will stay this way.

It was very easy and surprisingly straightforward to port C++ AMP code to OpenCL devices using OffloadCL compiler, looking forward to developing more using it!