I've been bothered by the state of GPGPU programming lately. And what better trigger for a blog post than the announcement of OpenCL 3.0.
There are a few companies in the world that I genuinely dislike. MathWorks being one of them. I hate to believe that Matlab's got a monopoly on the market of "quickly write fast stuff".
For a variety of reasons, I've been looking for alternatives. First, there's numpy. From my understanding, numpy has pretty much replaced Matlab in academia. A great achievement, but I fear that it still has a long way to go in terms of production use viability. That aside, the bigger issue, I think, was poor OpenCL support. It appears to enjoy numpy fully, you must stick to CUDA.
Next on the list was Julia, the language I dismissed in 2012 because it looked like Ruby. I thought it would have had loads of time to mature by now. However my findings indicate that it suffers the same drawbacks as numpy. No ability to ship binaries, and... a strong bias towards CUDA. Why on earth is this? Why can't there be a scripting language that allows for quick iteration, but with the ability to ship protected versions of the code and accelerating computations using OpenCL. Is this because OpenCL is terrible? Is it the support that is poor? Unpredictable performance? I'd love to know.
Another option is SYCL, is that what one is supposed to be using? Perhaps that's what I should be looking into. I am a unrelenting supporter of open standards. CUDA needs to go away.
But I don't think OpenCL is perfect either. Please note that I am not a GPGPU developer day-to-day. I really have no foundation for any accusations. But when I read "greater flexibility" and "optional functionality", I read "fragmentation". If vendors have the option, they will skip it. And developers can't count on anything. This was the situation that killed the Kinect, probably Microsoft's coolest product ever. It also ruined lines in WebGL. It is my opinion that optional features are bad. It splits the community. There will be the developers who know exactly what hardware they will be running on. They might invest in investigating and utilizing it. But the developers who know less about the hardware that will be used - will be doomed to use the lowest common denominator available.
I would love to be proved wrong. Perhaps the exact opposite is true. Perhaps OpenCL versions prior to 3.0 puts such high demands on vendors that they can't implement it fully anyways, and that is what has lead to such poor OpenCL support in languages and runtimes.