This week I've been slowly starting on my ISOC project, codename Ocius.
I've learnt basic cmake to enable cross platform builds. I've set up a git repo and set up my twitch account to automagically tweet whenever I start a broadcast. So head over and follow me on twitter if you want notifications on when I start to stream. I don't usually use twitter for CS stuff, but maybe it's time to start using that over facebook for those kinds of things.
I've also been doing some reading. A LOT of reading actually. I've been going through the C++ Wrapper API, the AMD APP Guide, I've been reading up on Thrust and various other things. Furthermore I've been reading some in my various books; C++ Concurrency in Action, OpenCL in Action, Professional C++ and The Art of Multiprocessor Programming. I realize my limited proficiency with C++ when I have to look up things like how to return an array from a function and when I am confused over template syntax. I have also been enjoying the sun while it has been so kind to shine the past few days so I admit I havn't been 100% productive. I also got stuck in a rather long game of Warlock.
I've been thinking a lot over how to structure this code. At first I was thinking of a rather heavy object called an Environment. This would contain the OpenCL platform, the context, all the devices and program/kernel information. This using the C++ wrapper api was harder than I thought. Or at least more clumsy. Because a machine can have multiple platforms and multiple devices within a platform. Luckily for me, you cannot create a context with devices from different platforms, so that makes it a little easier. For you who do not know, a platform is a specific OpenCL implementation. For example Intel. They only provide implementations for their CPUs - because that's the only thing they make. AMD however have one platform for both their CPU and GPU devices. And Nvidia only their GPUs as far as I'm aware. I havn't had the chance to develop for a Tegra device yet. So yeah, I don't know how to make such a class easy to use and flexible at the same time. My initial thought was that this one class would do all the heavy lifting, such as setup command queues and such. But then I learnt that every command queue only goes to a single device anyway. And I realized there is also no point in running kernels on multiple devices, because it's not possible to coordinate that work in any reasonable way. So perhaps it's more feasible to create a smaller object, which handles only a single device. Which is the most common configuration any way. However this depends on whether or not an OpenCL context is a lightweight object. Perhaps it is, perhaps it is not. But on the other hand, if you have multiple GPUs you might be looking to roll your own solution anyway. But with separate classes for this it would be easier to coordinate them this way.
What do you guys think? Maybe you can drop a line bellow if you have any experience with multi GPU setups. Considering I want to support and focus on CL-GL interop I think smaller compute environments might be the way to go after all. Because unless you're on an AMD-AMD system, having multiple contexts are mandatory anyway.