Deep Beta – Page 3 – Compute / Game Dev Blog.

Octree in Place

Grid 64x64x64 and 107 average FPS. — Step forward 🙂

Last two nights were all about Octree, however this morning last bug has been hunted down.
Now I need to write a structure of report, good luck me 😐

We can edit or remove current blocks – adding new ones will be introduced after optimisation stage.
Current version supports 4 standard blocks (Blue, Green, Yellow, Brown) and 1 point light (Blue with circle in the middle).

In this iteration of “boxel engine” regenerateBuffers method updates nearest block that collides with ray fired from camera POV within worldGrid 3 dimensional table, after that, buffers that contain shadow and light powers are recalculated, at the end two openGL methods are called (for each buffer):

 glBindBuffer(GL_ARRAY_BUFFER, vertexbuffer[sectorID]);
 glBufferSubData(GL_ARRAY_BUFFER, 0, frame_vertices[sectorID].size()*sizeof(glm::vec3),  &frame_vertices[sectorID][0][0]);

first, buffer is bind, and then values are updated, starting from index 0 and then updating whole buffer.
Optimisation, that I will be introducing soon, is updating only changed values that will reduce data size.
That has to be send to graphics card.

Note:

Grid 64 recalculation takes a while :S

Metrics will be added soon.

100 will be about: Shadows and some point Light(s)

Cubes 262144 Mem 642832K Mem Peak 744068K FPS 40 avg. FPS 42.1503 07012012 165612 — Shadows added - calculated on GPU but still works fine!

Shadow value for each cube is stored in separate grid, values are sent to shader and then output is altered. It all depends on value.
The value is calculated based on number of obstacles on the way between light and current cube.
Below you can find some metrics:

Earlier I have introduced point light to engine, there is example of one point light – shown below.

Cubes 262144 Mem 688628K Mem Peak 790600K FPS 34 avg. FPS 35.0108 07012012 170448 — Point lights introduced.. performance drops.. but that's good.

Below 510 point lights on 64x64x64 cubes grid.

Cubes 262144 Lights 510 Mem 689088K Mem Peak 790536K FPS 36 avg. FPS 36.3553 07012012 222015 — Point lights on 64 grid - 36 FPS.. pretty Ok.

And its metrics..

512 Point Lights took 2 Friends Episodes to load in

Metrics for grid of 32x32x32 cubes, below:

Why the results for point lights are almost the same?
All the calculations are precomputed on CPU so we are dealing with fixed buffer of light values – which is fair enough, however I need to optimise loading process so it will allow to add lights in real time.

Next step, editable terrain..

Directional Light performance impact on the scene

Cubes 262144 Mem 540892K Mem Peak 641888K FPS 33 avg. FPS 33.3537 22122011 030404 — Directional Lighted "boxel" scene

Very simple calculation of directional light had noticeable impact on the performance, table below shows it:

chart_1 — Performance impact for different number of boxes.

and yeah, SW:TOR is pretty Ok but nothing can win with “coding hunger”.

Finally.. textures with metrics

Cubes 262144 Mem 399688K Mem Peak 502988K FPS 551 avg. FPS 125.403 20122011 013809 — Textured 262144 cubes

This evening code hunger got me badly.. tomorrow, big launch of SW:TOR and lets be honest about it.. I will do no more honours project related work over this Christmas.

Anyway, in order to evaluate the software I have collected FPS and Memory Usage of the application.

Cart-textured — Metrics for above project

Base metrics are like this:

Funny enough, in texture integration progress I have made some optimisation, frame rate went up by slight cost of memory.. I am happy, I can deal with that.

Using OffloadCL to compile C++ AMP code for OpenCL

C++ AMP (Accelerated Massive Parallelism) is a GPGPU API (STL-like library) implemented by Microsoft in c++11.

Lately, I had a pleasure to use an alternative tool-kit that is not limited to DirectX.
Technology that can be used on any device – that can run OpenCL – which not only reaches the performance offered by the C++ AMP but even goes beyond it – in some cases.

More information about the OffloadCL tool-kit – note that the OffloadCL tool-kit does not implement C++ AMP, it’s the flexible design allows the possibility to make it work C++ AMP code; see example below.

The showcase I want to present in this post is a Binomial Option Pricing Model (BOPM) – my objective was to “port” the code from this blog post so it will be using OffloadCL tool-kit.

More information about BOPM.

After few hours of setting up OffloadCL, few “why is that not working?” later I was ready to start my first application using OffloadCL tool-kit.. Well almost..

I had no idea what are the methods and approaches used in the tool-kit – there is no official documentation – yet – however there was a couple of ready examples and header file, for the rescue!

Few minutes later I was good to go.

When I started reading the C++ AMP source code, was pretty usual – small functions that do the job, few #define etc etc.

//----------------------------------------------------------------------------
// Sequential(CPU) binomial option calculation
//----------------------------------------------------------------------------
void binomial_options_cpu()
{
  const unsigned data_size = MAX_OPTIONS;
    

    // this is like GPU kernel - where we have the meat
    for (unsigned i = 0; i < data_size; i++)
    {
        float call[NUM_STEPS + 1];

        // Compute values at expiration date:
        // call option value at period end is V(T) = S(T) - X
        // if S(T) is greater than X, or zero otherwise.
        // The computation is similar for put options.
        for(int j = 0; j <= NUM_STEPS; j++)
           call[j] = expiry_call_value(V_S[i], V_X[i], V_VDT[i], j);

        // Walk backwards up binomial tree
        for(int j = NUM_STEPS; j > 0; j--)
            for(int k = 0; k < = j - 1; k++)
                call[k] = V_PU_BY_DF[i] * call[k + 1] + V_PD_BY_DF[i] * call[k];

        CALL_VALUE_CPU[i] = call[0];
    }
}

Code from BinomialOptions.cpp

Code above is CPU version of binomial calculation, fairly simple.

However GPU version of this function is divided in two.

The method that prepares all buffers, in parallel_for_each function kernel method does all the calculations, for source code please download from here.

This is where things got complicated.. I had no big experience with parallel programming so I thought that I will need to spend a lot of time on research and learning, but apparently OffloadCL tool-kit made my work very easy..

What I had to do was to integrate the Offload compiler to the solution for the latest VS11 - there will be proper integration in the future.

Next step was to write some code in that file, small modification of the c++ code was enough to make the example work.

Get the source files for this post.

The OffloadCL tool-kit is still being improved, however I like the way how it works at the moment, the code is C like, simple and efficient.
I hope it will stay this way.

It was very easy and surprisingly straightforward to port C++ AMP code to OpenCL devices using OffloadCL compiler, looking forward to developing more using it!

Introduction to graphics programming submitted.

some screen shots from assembly instruction project, n-Gine saved me again.. above result of the work from one night..