Originally Posted by jmseight
Actually Python is running a little slow. I tried to optimize the code as much as I can using shifts instead of multiply and divide, etc.
I am now researching how to put c code into Python on my Win 7 environment.
While the grayscale images (wxPython and pygtk) are limited by file size transfer to the SHD (3300~3400msec/frame) for 70~80kB jpeg image, I believe the dithered image (pygtk) is computation limited (~2000msec/frame) for 25~30kB png.
I tried to dither the wxPython code, but it is incredible slow due to computation.
On modern computers shifts are SLOWER than multiply or divide, and table lookups are slower than long integer computations (like geekmaster formula 42). The "old-school" optimizations actually slow things down, sometimes a LOT.
Read the "Agner Fog" optimization stuff (modern) and the older stuff by Michael Abrash (especially the graphics stuff). Especially, study branch-free and cache-oblivious algorithms, and lock-free queues.
P.S. Read the above links, especially "The Aggregate Magic Algorithms" (please). Although some of this is targetted towards Intel CPUs, the ideas and techniques are applicable to embedded computing as well.