silven.nu

About me

My name is Mikael Silvén.
I was born in '89, I work in IT and this is my website where I occasionally write about stuff.

Disclaimer

Any blogopinions expressed here are mine alone and do not (necessarily) represent those of my employer or co-workers.

Introduction to OpenCL

December 10, 2011 at 21:56

There's been a lot of buzz about GPGPU, or General-Purpose computation on Graphics Processing Units, lately and with much right. I've just recently ended a course on the subject at my university and thought I'd share some code I came up with for the OpenCL part.

The first thing you need, is the OpenCL SDK and runtime for your platform. I provide the following links: Intel CPUs AMD APUs, CPUs and GPUs Nvidia GPUs

Once whichever of these are setup, you are good to start developing OpenCL accelerated software. For making it easier to play around with this, I created a skeleton enabling me to write short programs invoked by something looking like this.

#include "opencl_suit.h"
#include "bitonic_sort.h"

int main(int argc, char** argv) {
    CLHost clProgram;
    clProgram.loadProgram("bitonic.cl", NULL);
    
    CustomApp clApp(&clProgram, "bitonic_sort");
    clApp.run();
}

This very naive start creates what I call a CLHost. Which takes care of the context and runtime compilation of the CL source code as specified by parameters to the loadProgram member function. Then, the CustomApp runs a single kernel with parameters and everything else specified in the implementation of the class. In this case it runs a parallel multi pass bitonic sort.

A complete run of the program can look like this.

mikael@afrodite(bitonic)> ./bitonic
Initializing OpenCL Context...
clGetPlatformIDs: CL_SUCCESS
clGetDeviceIDs (get number of devices): CL_SUCCESS (answer: 1)
clGetDeviceIDs (create device list): CL_SUCCESS
clCreateContext: CL_SUCCESS
Loading the program from file: "/home/mikael/programmering/bitonic/bitonic.cl"...

== LOADED FOLLOWING PROGRAM ==
__kernel void bitonic_sort(__global int* data, const unsigned int j, const unsigned int k) {
    unsigned int i = get_global_id(0);
    unsigned int ixj = i^j;
    int datai = data_;
    int dataixj = data[ixj];

    if ((ixj)>i) {
        if ((i&k)==0 && datai > dataixj) {
            data_ = dataixj;
            data[ixj] = datai;
        }
        if ((i&k)!=0 && datai < dataixj) {
            data_=dataixj;
            data[ixj]=datai;
        }
    }
}

== END OF PROGRAM ==
clCreateProgramWithSource: CL_SUCCESS
Building the program...
clBuildProgram: CL_SUCCESS
clGetProgramBuildInfo: CL_SUCCESS
clGetProgramBuildInfo: CL_SUCCESS
== BUILD LOG ==
Build started
Kernel <bitonic_sort> was successfully vectorized
Done.
== END OF LOG ==
...Done building.
clCreateKernel: CL_SUCCESS
clCreateBuffer: CL_SUCCESS
clSetKernelArg: CL_SUCCESS
Running the kernel...
Time spent executing according to OpenCL: 145.26 ms.
clEnqueueReadBuffer: CL_SUCCESS
The array was: sorted.
Releasing OpenCL App Memory
Done releasing OpenCL App Memory
Releasing OpenCL Host memory
OpenCL Host memory released

This is like I said, a rather naive first implementation which has a couple of flaws. Including but not limited to, every CustomApp runs only one kernel. The separation of what is actually individual when it comes to initialization of the apps arn't perfect. There is no good way to return results apart from void pointers. The OOP way of the suit doesn't mix well with things like glut. These are things I might fix when I figure out how and when the need arises.

You can find the complete code here.

Back Permalink programming, opencl