I have done may acceleration and optimisation jobs on various platforms. These include FPGA RTL level design, CUDA programming and parallel/distribute CPU cluster design.
It usually costs me weeks, if not months, to optimise a design in implementation level. I also need to know the hardware platform very well. This learning time can be considered as a one-off cost, for each platform. Finally, the optimised implementation needs constant upgrade to fully utilise the ever advancing computing platforms (CPU, GPU, compilers?!).
The question I keep asking myself is:
Is it wise to spend so much time and effort on optimisation or acceleration?
My answer is: It is wise if your application fits in (at least) one of the following categories:
- Time is money.
Literally, time can be translated directly into currency. In some business, a few seconds advantage can easily payoff the salary of the engineering team. I have see real live examples and some of my friends are working exactly in these areas (high frequency trading, oil/gas exploration, etc.). As far as I know, both the bankers and the engineers are happy about the outcome. And the business is somehow forced to go in this direction: if somebody else is making money faster in the same market, it usually means that you are getting less at the same time.
But the value in time also easily renders the results valueless in short time. When it is so critical to produce a useable result before the raw data becomes out-dated, the latency becomes the bottleneck. Conventional high performance computing (HPC) methodologies, which emphasis on throughput, are not applicable here. And the physical limitation (e.g. the speed of light) will eventually stall the race to the lowest latency.
My opinion: It is a fast growing area but it may grow to its end sooner than we thought.
- Repetitively running jobs.
Considering all the overheads in a real word application, including disk I/O, memory copies, process synchronisation and data preparation, we seldom see over 10x speedup in overall execution time. It's simple math that if the portion (in term of execution time) of the application which can be accelerated is less than 90%, the maximum achievable speedup is already less then 10x. If the resulting application is run once per month, most people won't care if it is 2 hours or 20 hours.
But if the resulting application is ran every hour by every staff of a reasonable size team, even a humble 1.5x improvement will save you a significant amount of man-month in the business. Also, the target end users in this category are easily satisfied by that 1.5x speedup, for a very long time. (I will be crazily happy if Xilinx would speedup the place-an-route process for 1.5x.) This category also includes the software industry where copies of a single product is sold in tens of thousands.
My opinion: It is worth doing but you don't usually see big excitement in it.
- Framework development.
This is for the developers or consultants who is planning to make a living in application acceleration. It is worth to plan well for each platform and create a framework which you can reuse in later projects. Again, it is a larger up-front payment for repetitive (development) jobs.
Apart form these, I don't see why one should spend weeks in acceleration. Pay for a good compiler, use an optimised library, and play with the compilation options will easily give one big improvement.