sverx:
I'm counting the cycles taken to do the work using both the original and the optimized code, and I calculate the speed increase using the following formula: (float)((cycles_original/cycles_optimized)-1)*100
So for example if cycles_optimized it's half cycles_original, the formula gives 100. "114% faster" means that cycles_optimized it's even less than half cycles_original.

PypeBros:
I'm trying to figure out what you measure to claim that you got "114% faster" ... that clearly can't be the amount of time needed to complete a task. Amount of frames/pixels you can render within a given amount of time, maybe ?