C++ AMP (Accelerated Massive Parallelism) is a GPGPU API (STL-like library) implemented by Microsoft in c++11.

Lately, I had a pleasure to use an alternative tool-kit that is not limited to DirectX.

Technology that can be used on any device – that can run OpenCL – which not only reaches the performance offered by the C++ AMP but even goes beyond it – in some cases.

More information about the OffloadCL tool-kit – note that the OffloadCL tool-kit does not implement C++ AMP, it’s the flexible design allows the possibility to make it work **C++ AMP code**; see example below.

The showcase I want to present in this post is a Binomial Option Pricing Model (BOPM) – my objective was to “port” the code from this blog post so it will be using OffloadCL tool-kit.

More information about BOPM.

After few hours of setting up OffloadCL, few “why is that not working?” later I was ready to start my first application using OffloadCL tool-kit.. Well almost..

I had no idea what are the methods and approaches used in the tool-kit – there is no official documentation – yet – however there was a couple of ready examples and header file, for the rescue!

Few minutes later I was good to go.

When I started reading the C++ AMP source code, was pretty usual – small functions that do the job, few #define etc etc.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
| //----------------------------------------------------------------------------
// Sequential(CPU) binomial option calculation
//----------------------------------------------------------------------------
void binomial_options_cpu()
{
const unsigned data_size = MAX_OPTIONS;
// this is like GPU kernel - where we have the meat
for (unsigned i = 0; i < data_size; i++)
{
float call[NUM_STEPS + 1];
// Compute values at expiration date:
// call option value at period end is V(T) = S(T) - X
// if S(T) is greater than X, or zero otherwise.
// The computation is similar for put options.
for(int j = 0; j <= NUM_STEPS; j++)
call[j] = expiry_call_value(V_S[i], V_X[i], V_VDT[i], j);
// Walk backwards up binomial tree
for(int j = NUM_STEPS; j > 0; j--)
for(int k = 0; k < = j - 1; k++)
call[k] = V_PU_BY_DF[i] * call[k + 1] + V_PD_BY_DF[i] * call[k];
CALL_VALUE_CPU[i] = call[0];
}
} |

//----------------------------------------------------------------------------
// Sequential(CPU) binomial option calculation
//----------------------------------------------------------------------------
void binomial_options_cpu()
{
const unsigned data_size = MAX_OPTIONS;
// this is like GPU kernel - where we have the meat
for (unsigned i = 0; i < data_size; i++)
{
float call[NUM_STEPS + 1];
// Compute values at expiration date:
// call option value at period end is V(T) = S(T) - X
// if S(T) is greater than X, or zero otherwise.
// The computation is similar for put options.
for(int j = 0; j <= NUM_STEPS; j++)
call[j] = expiry_call_value(V_S[i], V_X[i], V_VDT[i], j);
// Walk backwards up binomial tree
for(int j = NUM_STEPS; j > 0; j--)
for(int k = 0; k < = j - 1; k++)
call[k] = V_PU_BY_DF[i] * call[k + 1] + V_PD_BY_DF[i] * call[k];
CALL_VALUE_CPU[i] = call[0];
}
}

*Code from BinomialOptions.cpp*

*
*Code above is CPU version of binomial calculation, fairly simple.

However GPU version of this function is divided in two.

The method that prepares all buffers, in parallel_for_each function kernel method does all the calculations, for source code please download from here.

This is where things got complicated.. I had no big experience with parallel programming so I thought that I will need to spend a lot of time on research and learning, but apparently OffloadCL tool-kit made my work very easy..

What I had to do was to integrate the Offload compiler to the solution for the latest VS11 – there will be proper integration in the future.

Next step was to write some code in that file, small modification of the c++ code was enough to make the example work.

Get the source files for this post.

The OffloadCL tool-kit is still being improved, however I like the way how it works at the moment, the code is C like, simple and efficient.

I hope it will stay this way.

*It was very easy and surprisingly straightforward to port C++ AMP code to OpenCL devices using OffloadCL compiler, looking forward to developing more using it!*