Actually Python is running a little slow. I tried to optimize the code as much as I can using shifts instead of multiply and divide, etc.
I am now researching how to put c code into Python on my Win 7 environment.
While the grayscale images (wxPython and pygtk) are limited by file size transfer to the SHD (3300~3400msec/frame) for 70~80kB jpeg image, I believe the dithered image (pygtk) is computation limited (~2000msec/frame) for 25~30kB png.
I tried to dither the wxPython code, but it is incredible slow due to computation.