I thought I'd just give you a quick update on how things are going with the project: I've gotten some very basic functionality to work. I can do some basic buffer manipulation and are currently working on image support. Speaking of image support, I wasted like 2 hours today trying to fix an error because of an arbitrary limitation.
CommandQueue::enqueueNDRangeKernel returned CL_OUT_OF_RESOURCES when I tried to run my kernel. How the hell? I have 1 GB VRAM on my laptop. It's a Fermi with full image capabilities and everything, how can it be out of resources? So I tried smaller images. And even smaller images. And ridiculously small images. But even 4x4 gave me the same error. I even try with an empty kernel. Nothing. So I start to google like I'm possessed. After a while I see a similar example, but the only difference I see is that they're using CL_RGBA for their channel order. While I use CL_RGB. I'm simply not interested in the alpha. HOWEVER, this is where the limitation comes in. According to the cl_image_format specification (A document I've read at least twice before) CL_RGB can only be used with certain channel formats. This is where I face palm. I add an "A" to the end, and suddenly everything works.
I do not understand this limitation, but I can accept it. What on the other hand is absolutely ludicrous is the fact that it results in a CL_OUT_OF_RESOURCES error from enqueueNDRangeKernel... How about a CL_INVALID_ARG_SOMETHING? Why not? Why CL_OUT_OF_RESOURCES? It might as well have been PC LOAD LETTER, it makes zero sense. Like I said, this basically cost me like 2 hours. If you know why, feel free to drop me a line explaining it, or point to where it is explained.
I also realized I really have to try to learn more template meta programming. I know there are more elegant solutions than some of the hacks I've done.
On the more positive side, while I was looking around the internet today I noticed several other people that have been writing code very much like my own. So I guess that means I'm on to something? That what I'm doing is fairly intuitive and the obvious solution? Like I said in my previous post, it's hard to try to wrap OpenCL without taking away its flexibility. But I've hit some middle ground now with both my Environment class and a Device class, which handles the command queue. So now I can write stuff like this:
cl::Buffer cBuffer = env.createBuffer<float>(CL_MEM_READ_WRITE, 1024); addKernel.setArgs(aBuffer, bBuffer, cBuffer); cl::Event event = dev.runKernel(addKernel);
And that feels pretty good. However I'm a little confused by the fact that buffers are created from the context without any particular device. When you use CL_MEM_USE/COPY_HOST_PTR the buffer is implicitly uploaded to the device when you enqueue a kernel using them as arguments. This can make it hard to keep track of what data is where.. Maybe it's a good idea to write classes wrapping Buffers/Images as well. I'll have to think about it.
Wow this post grew fast, there is however one more thing I have to say. I'll be going away for two weeks. That means fewer streams, more sun, and less work done. Though it also gives me time to apply Hammock Driven Development and think about how to structure the rest of the program.