During my vacation back in May 2011 I was stuck for 4 days between an unexpected incredible snowstorm on one side and the eruption of the Grímsvötn volcano on the other side, of course in Iceland. Well... I had a lot of time and very little things to do, so I spent some time trying to figure out the fastest method of calculating a weighted average between two RGB colors, a and b, so that the result would be (3a + b)/4.
What for? Because I had already started being interested in DSx86, a PC emulator for Nintendo DS. If you've never tried this amazing homebrew, I suggest that you do so as soon as possible. DSx86 author 'Pate', in his May 15 blog post was seeking for suggestions on how to perform a faster weighted
average between two colors. His then method was to run a normal average twice, to achieve a weighted one: tmp = (a+b)/2 then avg = (a+tmp)/2.
So what's the reason why I'm writing this post now? Well... time passes and memories start to fade, so I wanted to write down my thoughts and share them before they are gone completely. You know, I'm growing older ;)
If we can define a + b = (a ^ b) + ((a & b) << 1) as it appears in the following truth table:
a b a+b
0 0 00
0 1 01
1 0 01
1 1 10
then the average formula will be
(a + b)/2 = ((a ^ b)>>1) + (a & b).
Since our colors are halfwords (16 bits) where 5 bits are reserved for each RGB component, such as xBBBBBGGGGGRRRRR, the right shifting would make the least significant bit of the blue and green components fall into the bits reserved for the green and red components respectively, we should actually mask each lsb of (a ^ b) result before shifting. Thus we will obtain
(a + b)/2 = (((a ^ b) & ~0x421) >>1) + (a & b)
which is an accurate average of two RGB colors obtained without having to calculate each component average separately (please, read the very interesting Quick colour averaging article on CompuPhase web site).
Similarly, we can define 3a + b as
a b 3a+b
0 0 000
0 1 001
1 0 011
1 1 100
which can be expressed as (a ^ b) + ((a & ~b)<<1) + ((a & b)<<2). To obtain the weighted average, we still have to divide it by 4, which results in
(3a + b)/4 = (a ^ b)>>2 + ((a & ~b)>>1) + (a & b)
Again, the shifts here would make the least significant bits fall into the other components, so we have to clear the least significant bit for the 1-bit right shift and clear two least significant bits for the 2-bit right shift. Finally, we get
(3a + b)/4 = (((a ^ b) & ~0xC63) >>2) + (((a & ~b)
& ~0x421)
>>1) + (a & b)
The normal average was implemented using 4 ARM assembler instructions, and had to be done twice. On the contrary, the weighted
average calculated as per my expression can be coded using 7 ARM instructions only, which allows to save 1 cycle per weighted average. Not bad if you consider that all 200 320-pixel-wide lines of the VGA screen have to be converted into 256-pixel-wide lines to fit the DS screen up to 60 times per second. To do this, you need to perform two weighted averages every 5 pixels.
There are some other nice tricks I used to speed up things even more... but I'll detail those in the next post because I'd prefer to contain this one to quick color averaging subject only.