geekmaster dither formula demystified?

geekmaster · 01-04-2014, 02:24 PM

There have been requests, and I have posted, that when I get some time, I will write a bit about how my dither code works. This thread is the beginning of such an effort. There will be more to come later...

As published in the geekmaster kindle video player thread, here is the "raw2gmv" kindle video transcoder:

PHP Code:


			
//====================================================
// raw2gmv 1.0a - raw to geekmaster video transcoder
// Copyright (C) 2012 by geekmaster, with MIT license:
// http://www.opensource.org/licenses/mit-license.php
//----------------------------------------------------
#include <stdio.h>  // stdin,stdout
typedef unsigned char u8; typedef unsigned int u32;
int main(void) {
    u8 o,to,tb,wb0[800*600]; u32 x,y,xi,yi,c=250,b=120; // c=contrast, b=brightness
    while (fread(wb0,800*600,1,stdin)) for (y=0;y<800;y++) { xi=y; tb=0;
        for (x=0;x<600;x++) { yi=599-x; o=x^y;
            to=(y>>2&1|o>>1&2|y<<1&4|o<<2&8|y<<4&16|o<<5&32)-(wb0[yi*800+xi]*63+b)/c>>8;
            tb=(tb>>1)|(to&128); if (7==(x&7)) { fwrite(&tb,1,1,stdout); tb=0; }
        }
    } return 0;
}

It takes raw 800x600 8bpp video piped into STDIN, rotates it to 600x800 for kindle portrait mode, and dithers it while computing the dither table on-the-fly, then outputs it to STDOUT. Such STDIO piping allows chaining multiple programs together on the command line. The output can be redirected to a file to be played later, or it can be piped directly into the video player program, or it can be piped into netcat and sent over a network.

It uses an in-line formula with no subroutines, to be branch-free (no if-else statements) which makes it cache-friendly.

The while statement reads a series of single 800x600 video frames (one byte per pixel).

The outer for loop processes entire 800-pixel rows from the current video frame, and the inner for loop processes individual pixels (8-bit grayscale) from the current video row.

Inside this outer loop we set xi=y, to convert input rows to output columns, as needed for portrait mode output.

The body of code inside the for loops swaps x and y coordinates to rotate the video into portrait mode, and computes a dither table threshold for the current framebuffer pixel position, then computes a single output bit (black or white pixel). Finally, it packs the pixel bit into a byte, which when full (eight sequential pixel bits), it writes that full byte to STDOUT.

Here is the loop body (with added white space):

PHP Code:


			
    yi=599-x;
    o=x^y;
    to=(y>>2&1|o>>1&2|y<<1&4|o<<2&8|y<<4&16|o<<5&32)-(wb0[yi*800+xi]*63+b)/c>>8;
    tb=(tb>>1)|(to&128);
    if (7==(x&7)) {
        fwrite(&tb,1,1,stdout);
        tb=0;
    }

Notice that yi traverses from 599 to 0, depending on x, to prevent a mirror image when rotating our input video from 800x600 landscape mode to kindle 600x800 portrait mode. We are building the OUTPUT video in portait mode, so we need to raster-scan columns from input to output in each frame buffer to build rows of output video.

The next statement sets variable o (order) in correspondence with a checkerboard pattern in the framebuffer, representing the output pixel bits needed for a 50-percent grayscale at each resolution step.

Then we get a bit complicated, while computing the to variable. The key idea is that some variables are used in the logical expression in a way that enables or disables portions of the overall formula, much like how if-else statements could also be used to do the same thing.

The first part of the expression, in parentheses, computes the dither threshold for the current pixel position. Notice how it contains a series of values in a binary progression (&1, &2, &4, &8, &16, &32). These bits are conditionally combined into a 6-bit dither value (0-63). The condition that determines whether these bits remain a shifted one-bit or become a zero-bit is taken from either a shifted y-position or a shifted o value (x xor y), which modified the 50-percent ordered dither mask (in o) to increase or decrease the value depending on its position in the framebuffer. This effectively reproduces the common ordered dither table that is found in most dither routines, without needing a dither table to be precomputed, and without memory lookups (when memory access is very slow compared to real-time computation on modern cached processors). The best way to see how this dither formula works is to fill a table with the values computed by using it, and compare that to a traditional dither table such as in my formula-42 dither logic. The whole point of a dither table is to evenly distribute all values 0 to 63, across a square texture tile, and use that to tile the entire framebuffer. Then each input value is compared against these values, used as a threshold, to determine whether the output pixel is black or white.

wb0 is our work-buffer (in-memory version of a framebuffer such as /dev/fb0). wb0[yi*800+xi] is the current input pixel, an 8-bit value. Using scaled-integer arithmetic, we do a brightness and contrast adjustment using predefined values for c=250 and b=120 (as determined empirically for good video image quality). Essentially, we multiple the 8-bit pixel by 63, add 120 (default brightness), divide by 250 (default contrast).

Instead of comparing each input value against a dither threshold, we subtract the scaled input value from the dither threshold value so that the result is positive or negative depending on threshold result, then we shift the sign bit with >>8 to extend it down into bottom byte of the computation, making the sign-bit either 0 or 255 (pure black or pure white).

And last, we mask the sign bit with &128, and shift it into the output byte. And yes, I did cheat a little in that an if statement determines when to write out that output byte. However, there are no branches while computing each pixel bit. The reason for using a logic expression is that traditional if-else statements will run much slower if it confuses CPU cache branch prediction (which is very simplistic in RISC embedded processors). Some compilers may automatically convert if-else statements into more efficient code, however, that cannot be relied on (especially when using tcc while compiling in the kindle itself).

I hope that helps understand this code a little bit better. Notice that this only shows the DITHER step, and does not include extra logic in the gmplay program, needed to deal with variations between different kindle models, and between main and diags modes. You may wish to compare and contrast the gmplay code against raw2gmv, or you can wait until I take time to document that a bit better too. However, until then, read the "theory of operation" details by clicking the appropriate button in that the geekmaster kindle video player first post.

geekmaster · 01-04-2014, 02:43 PM

(reserved for more content)

For now, here is my older dither logic that uses a dither table, but does take kindle variations into account:

PHP Code:


			
// global vars
u8 *fb0=NULL;   // framebuffer pointer
int fdFB=0;    // fb0 file descriptor
u32 fs=0;     // fb0 stride
u32 MX=0;    // xres (visible)
u32 MY=0;   // yres (visible)
u8 blk=0;  // black
u8 wht=0; // white
u8 pb=0; // pixel bits

//========================================
// setpx - draw pixel using ordered dither
// x,y: screen coordinates, c: color(0-64).
// (This works on all eink kindle models.)
//----------------------------------------
void inline setpx(int x,int y,int c) {
    static int dt[64] = { 1,33,9,41,3,35,11,43,49,17,57,25,51,19,59,27,13,45,5,
    37,15,47,7,39,61,29,53,21,63,31,55,23,4,36,12,44,2,34,10,42,52,20,60,28,50,
    18,58,26,16,48,8,40,14,46,6,38,64,32,56,24,62,30,54,22 }; // dither table
    fb0[pb*x/8+fs*y]=((128&(c-dt[(7&x)+8*(7&y)]))/128*(blk&(240*(1&~x)|
        15*(1&x)|fb0[pb*x/8+fs*y])))|((128&(dt[(7&x)+8*(7&y)]-c))/128*wht|
        (blk&((240*(1&x)|15*(1&~x))&fb0[pb*x/8+fs*y]))); // geekmaster formula 42
}

I will let you meditate on that until I add more here later...

geekmaster · 01-04-2014, 02:44 PM

And here is the relevant code that takes the pre-dithered content piped in from raw2gmv or from a pre-dithered video (gmv) file, and converts it to whatever the framebuffer requirements dictate for the kindle models and boot modes supported (all of them, as of the PW).

PHP Code:


			
//----- gmlib global vars ----- 
enum GMLIB_op {GMLIB_INIT,GMLIB_CLOSE,GMLIB_UPDATE,GMLIB_VSYNC}; 
u8 *fb0=NULL;     // framebuffer pointer 
int fdFB=0;      // fb0 file descriptor 
int teu=0;      // eink update time 
u32 fs=0;      // fb0 stride 
u32 MX=0;     // xres (visible) 
u32 MY=0;    // yres (visible) 
u32 VY=0;   // (VY>MY): mxcfb driver 
u8 ppb=0;  // pixels per byte 
u32 fc=0; // frame counter 
#define FBSIZE (600/8*800) 
//================================== 
// gmplay4 - play video on 4-bit fb0 
//---------------------------------- 
void gmplay4(void) { 
    u32 i,x,y,b,p,off=(MY/2-400)*fs+MX/4-150,fbsize=FBSIZE; u8 fbt[FBSIZE]; 
    while (fread(fbt,fbsize,1,stdin)) { teu+=130; // teu: next update time 
        if (getmsec()>teu+1000) continue; // drop frame if > 1 sec behind 
        gmlib(GMLIB_VSYNC); // wait for fb0 ready 
        for (y=0;y<800;y++) for (x=0;x<600;x+=8) { 
            b=fbt[600/8*y+x/8]; i=y*fs+x/2+off; 
            p=(b&1)*240; b>>=1; fb0[i]=p|(b&1)*15; b>>=1; 
            p=(b&1)*240; b>>=1; fb0[i+1]=p|(b&1)*15; b>>=1; 
            p=(b&1)*240; b>>=1; fb0[i+2]=p|(b&1)*15; b>>=1; 
            p=(b&1)*240; b>>=1; fb0[i+3]=p|(b&1)*15; 
        } fc++; gmlib(GMLIB_UPDATE); 
    } 
} 
//================================== 
// gmplay8 - play video on 8-bit fb0 
//---------------------------------- 
void gmplay8(void) { 
    u32 i,x,y,b,fbsize=FBSIZE; u8 fbt[FBSIZE]; 
    while (fread(fbt,fbsize,1,stdin)) { teu+=130; // teu: next update time 
        if (getmsec()>teu+1000) continue; // drop frame if > 1 sec behind 
        gmlib(GMLIB_VSYNC); // wait for fb0 ready 
        for (y=0;y<800;y++) for (x=0;x<600;x+=8) { 
            b=fbt[600/8*y+x/8]; i=y*fs+x; 
            fb0[i]=(b&1)*255; b>>=1; fb0[i+1]=(b&1)*255; b>>=1; 
            fb0[i+2]=(b&1)*255; b>>=1; fb0[i+3]=(b&1)*255; b>>=1; 
            fb0[i+4]=(b&1)*255; b>>=1; fb0[i+5]=(b&1)*255; b>>=1; 
            fb0[i+6]=(b&1)*255; b>>=1; fb0[i+7]=(b&1)*255; 
        } fc++; gmlib(GMLIB_UPDATE); 
    } 
}

I will add more details about how it works here, but again, there are already some details in the relevant posts:
https://www.mobileread.com/forums/sho...d.php?t=177455

geekmaster · 01-04-2014, 03:51 PM

(reserved for 256-color to 16-color dither from newtrix demo)

https://www.mobileread.com/forums/sho...d.php?t=176802

This code in particular will be a good starting point for a "generic" dither routine to be used with any (native mode) app. You feed it a 256-color framebuffer, which does not need to be pre-dithered (unlike real framebuffer content on a K5), and this routine converts it to whatever is needed to be in the real framebuffer of any eink kindle model in any boot mode (probably needs updates to work on PW or PW2).

This code uses dither tables, so I will want to convert that to logical formulas as shown in the first post in this thread.

geekmaster · 01-04-2014, 04:00 PM

Comments, corrections, or requests for more info are welcome. Please post your thoughts on this subject. Thanks.

hawhill · 01-05-2014, 08:29 AM

Thank you very much for this! I'll need to take some time (and probably a few steps back) to digest this, I guess, but it will be well worth it.

geekmaster · 01-05-2014, 09:55 AM

Here is the wikipedia page on ordered dither, as used here:
http://en.wikipedia.org/wiki/Ordered_dithering

Here is an 8x8 dither table like we use here, from that wikipedia page:

Note that it contains 64 values, from 1 to 64. In order to simplify the comparison into a simple subtract, giving the threshold result in the sign bit, I subtracted one from these dither table values giving 0 to 63. Technically, an ordered dither really gives an extra value when you take zero into account (hence the 1/65 shown to the left of the image), and I prefer to skip a value in the center of the range so I can still have pure black and pure white area fills. In my case, I handle this with the contrast and brightness values to center and stretch the range for best image quality (which can result in more or less bright and dark areas than in the input image, to be adjusted for best eink compatibility as I did for the default values).

In this code the dither threshold values are used to determine whether an output pixel will be pure black or pure white. In my 256-color to 16-color dither code as used in the newtrix demo, the threshold values are used to determine when to increment a 16-bit output to the next brighter value to give the appearance of 256 different shades of gray and to prevent "Mach banding" in gradients such as are common in a cloudy sky image.

As I mentioned previously, the important things about a dither table are that it contains a single instance of each value in the supported range (0-63 in my case) and that these values are evenly distributed through the table so that an particular grayscale value will consist of evenly distributed black and white pixels in the output image, in a ratio that gives an average value equal to the value of that pixel in the input image.

You can think of a dither table as an 8x8 texture image that can be tiled to fill the framebuffer. In fact, modern video graphics cards have shaders that can do dithering by using just such a dithered texture image, corresponding exactly with our dither table.

As you can see in the code snippets posted earlier in this thread, I have code that contains a real dither table, and code that computes the table values as-needed where needed using a logical formula.

I tested other dither methods, and I decided that a simple ordered dither like this looks the best for video on the kindle eink displays. One of the demos in newtrix also does a spatio-temporal random dither, which has some interesting artifacts causing severe "ringing" that appear as a trail behind moving objects that oscillates between too bright and too dark while it fades, with tail length depending on kindle model and boot mode. Ordered dither is more appealing and more consistent across all the eink kindle models, IMHO.

NiLuJe · 01-05-2014, 11:06 AM

I'm pretty sure this was linked from here or IRC, so it's probably not news to anyone, but that looked neat too: http://pippin.gimp.org/a_dither/

geekmaster · 01-05-2014, 12:19 PM

Quote:

Originally Posted by NiLuJe

I'm pretty sure this was linked from here or IRC, so it's probably not news to anyone, but that looked neat too: http://pippin.gimp.org/a_dither/

Thanks for the link, and it may indeed help people to understand dithering in a more visual manner.

Dither methods other than ordered dither can look better for static images, but my experimentation shows that for eink displays, it is important to not change pixels too often, which causes temporal aliasing with the eink drivers, with artifacts that vary between kindle models. The result as mentioned previously is a fading trail that pulsates and can be very annoying for movies or video games. However it does look interesting when that effect is used for artistic purposes, like in one of the demo effects used in my newtrix demo.

As it turns out, the ordered dither matrix remains anchored to the display, assuring that pixels only change when they really must.

However, for displays that do not have such requirements, such as LCD or OLED, adding a temporal component to the dither (pixels not anchored to screen coordinates) can improve the image. The non-anchored sparkle gets averaged out by the eye, and in fact most modern LCD displays actually do some degree of temporal dither to achieve more effective colors than the hardware could do otherwise.

But we are working with eink here, where an anchored dither has positive perceptual significance in how it interacts with the eink display hardware and device drivers, hence my choice of dither method.

EDIT: Ordered dither is also known as "Bayer dither", as it is called in the linked page above (which also shows that it provides best compression, due to fewer pixels changing between sequential video frames, besides being more "eink compatible" due to less "pulsating" temporal artifacts).

geekmaster · 01-05-2014, 01:15 PM

Regarding a "karma comment" request (twobob) that we need an "ALL DEVICE" dither wrapper, I agree, but it really needs multiple functions. Besides my setpx() function that sets individual pixels in the dithered screen for highest speed updates of small areas of the screen, I think we need to be able to set 256-color pixels in the work buffer and then call a page update function that dithers the full frame just before calling the eink update function (as can be seen in one of the demos in newtrix). And perhaps a 256-color setpx() type of function that sets 256-color pixels in the work buffer as well.

Actually, I had hoped that people could use my dither routines pretty much as-is (black box style) by lifting them out of my demos, but perhaps they do need to be packaged better for more widespread adoption by this community.

I have been doing stuff like this for decades, so it seems obvious to me, but we all come from different backgrounds and cultures, so it is not really fair for me to apply such expectations or desires on this community as a whole. I need to make my stuff more accessible, though I really prefer to spend the time it takes to do all the "finishing touches" working on new stuff and new ideas, and leave the repackaging details to others.

This thread is an attempt to begin to bridge that gap.

islewind · 01-06-2014, 09:23 AM

Quote:

Originally Posted by geekmaster

Thanks for the link, and it may indeed help people to understand dithering in a more visual manner.

But we are working with eink here, where an anchored dither has positive perceptual significance in how it interacts with the eink display hardware and device drivers, hence my choice of dither method.

EDIT: Ordered dither is also known as "Bayer dither", as it is called in the linked page above (which also shows that it provides best compression, due to fewer pixels changing between sequential video frames, besides being more "eink compatible" due to less "pulsating" temporal artifacts).

The linked page is about dither algorithm developed for use for video / gif animation / ui animations on eink devices; GIF animations shares similar constraints with dithering for eink - which is why the page focuses on that for its illustrations. My kobo mini easily runs such dithered full screen updates at ~10fps, using double buffering (with a full deph buffer backing the animation).

All the GIFs on that page; apart from the rightmost which is using error-diffusion are using ordered dithers. The "a dither" methods have a wider range of display'able colors, most of the code on the page is there to let people play around with the formulas; I expect coders to be able to lift the core of it out of the switch statement yielding something close to 6-7 instructions/pixel. (sadly though; on most archs; a LUT will still beat this procedural mask). For higher perceptual quality from an ordred dither; using "void and cluster" to generate the LUT would be among the best options; but that is computationally a _lot_ more complex than the simple pseudo-random patterns used in the methods summed up on my page.

/pippin

geekmaster · 01-06-2014, 02:55 PM

Quote:

Originally Posted by islewind

The linked page is about dither algorithm developed for use for video / gif animation / ui animations on eink devices; GIF animations shares similar constraints with dithering for eink - which is why the page focuses on that for its illustrations. My kobo mini easily runs such dithered full screen updates at ~10fps, using double buffering (with a full deph buffer backing the animation).

All the GIFs on that page; apart from the rightmost which is using error-diffusion are using ordered dithers. The "a dither" methods have a wider range of display'able colors, most of the code on the page is there to let people play around with the formulas; I expect coders to be able to lift the core of it out of the switch statement yielding something close to 6-7 instructions/pixel. (sadly though; on most archs; a LUT will still beat this procedural mask). For higher perceptual quality from an ordred dither; using "void and cluster" to generate the LUT would be among the best options; but that is computationally a _lot_ more complex than the simple pseudo-random patterns used in the methods summed up on my page.

/pippin

I can see that the pixels in the "a dither" are also anchored in place like the bayer dither, and the forumulas for computing them are effectively similar, differing by use "magic number" multiplication instead of bitwise shift and or. The magic number method may be faster on modern processors, and COULD be faster than a table lookup, depending on a number of factors (mostly RAM access penalty, cache layout, and algorithmic access ordering). Also, the linked page uses floating point instead of integer, which may also incur a speed penalty depending on CPU, and on choice of compiler and optimization settings. Of course for modern GPU shaders in a desktop graphics card, float may be better, but in lesser devices like the (ARM based) Raspberry Pi (and perhaps eink kindles), integer can be much faster, even in shaders.

In my case, I am supporting older kindles including the DX and DXG here, so relying on magic number multiplication and especially floating point may make it run slower. Of course, the only way to tell for sure is to code it both ways and MEASURE the speeds, in addition to examining the assembler language output from the compiler.

The Bayer dither has a lot of research behind it. Reordering the dither threshold elements may make some images (or video) look better, but may also have other image examples where it falls short.

A table lookup can be faster if it can all fit in CPU cache and remain resident, but that depends a lot on a number of other factors. A 64-byte 8x8 dither table is certainly small enough if it does not need to continuously keep getting reloaded (such as every time another byte is written to STDOUT with a function call).

But for the update rate that all eink kindle models support (about 7.7 FPS), the dither function optimization is not all that critical. I did it the way I did it mostly because I enjoyed doing it that way, which is all that really matters for recreational programming such as this. For others to choose one dither method over another makes no difference really, as long as they select one that does the job, and they enjoy doing it.

islewind · 01-06-2014, 04:59 PM

The computation does not rely on floating point, the patch on the same page for ffmpeg uses only integer arithmetic; the javascript version is merely an interactive play-thing. Showing how to use these short procedural masks both for high bitdepth -> 1bit, as well as -> 4bit or 8bit. Meaning the same formula is fit for driving both high fps 1bit dithers; as well as high fidelity grayscale stills using slower waveforms. "A dither" started out as recreational programming as well; but turnd out interesting enough that I spent too much time optimizing and tuning it's visual characteristics. Fun to find something computationally cheap; that is equvilanet to the much more expensive computation of "void and dither" which creates masks that yield specific green/blue noise frequency distributions (using a, eew, patented algorithm).

IIRC on my x86 laptop it was less than ~50% slower than using a LUT. This is sufficient for my current prototype hacks replacing it with a 2d LUT/mask is among low hanging fruits I'll care about at a later stage when/if I care about battery life or need to do more computations per frame.

Use it if you want to, (I should probably wipe some dust of the tiny little framebuffer abstraction "library" I am using and put it out there, with a bit of help static binaries I build with the toolchain I use should work on kobo's as well as some kindles.)

twobob · 01-06-2014, 08:35 PM

Many thanks to everyone for their input on this issue.

This work, put in the right place, would open a new world of eink possibilities.

staylo · 01-07-2014, 01:04 PM

Hi geekmaster,

I picked up a DXG recently and I'm still trying to get up to speed on the limitations of the device. In a previous post (https://www.mobileread.com/forums/sho...d.php?t=177455) you mentioned that > 1.5fps was possible on the DX / DXG - any further details on what tricks were used to achieve that, and the update rate obtainable? (apologies if this isn't relevant enough to this thread)

01-04-2014, 02:24 PM	#1
geekmaster Carpe diem, c'est la vie. Posts: 6,433 Karma: 10773670 Join Date: Nov 2011 Location: Multiverse 6627A Device: K1 to PW3	geekmaster dither formula demystified? There have been requests, and I have posted, that when I get some time, I will write a bit about how my dither code works. This thread is the beginning of such an effort. There will be more to come later... As published in the geekmaster kindle video player thread, here is the "raw2gmv" kindle video transcoder: PHP Code: //==================================================== // raw2gmv 1.0a - raw to geekmaster video transcoder // Copyright (C) 2012 by geekmaster, with MIT license: // http://www.opensource.org/licenses/mit-license.php //---------------------------------------------------- #include <stdio.h> // stdin,stdout typedef unsigned char u8; typedef unsigned int u32; int main(void) { u8 o,to,tb,wb0[800600]; u32 x,y,xi,yi,c=250,b=120; // c=contrast, b=brightness while (fread(wb0,800600,1,stdin)) for (y=0;y<800;y++) { xi=y; tb=0; for (x=0;x<600;x++) { yi=599-x; o=x^y; to=(y>>2&1\|o>>1&2\|y<<1&4\|o<<2&8\|y<<4&16\|o<<5&32)-(wb0[yi800+xi]63+b)/c>>8; tb=(tb>>1)\|(to&128); if (7==(x&7)) { fwrite(&tb,1,1,stdout); tb=0; } } } return 0; } It takes raw 800x600 8bpp video piped into STDIN, rotates it to 600x800 for kindle portrait mode, and dithers it while computing the dither table on-the-fly, then outputs it to STDOUT. Such STDIO piping allows chaining multiple programs together on the command line. The output can be redirected to a file to be played later, or it can be piped directly into the video player program, or it can be piped into netcat and sent over a network. It uses an in-line formula with no subroutines, to be branch-free (no if-else statements) which makes it cache-friendly. The while statement reads a series of single 800x600 video frames (one byte per pixel). The outer for loop processes entire 800-pixel rows from the current video frame, and the inner for loop processes individual pixels (8-bit grayscale) from the current video row. Inside this outer loop we set xi=y, to convert input rows to output columns, as needed for portrait mode output. The body of code inside the for loops swaps x and y coordinates to rotate the video into portrait mode, and computes a dither table threshold for the current framebuffer pixel position, then computes a single output bit (black or white pixel). Finally, it packs the pixel bit into a byte, which when full (eight sequential pixel bits), it writes that full byte to STDOUT. Here is the loop body (with added white space): PHP Code: `yi=599-x; o=x^y; to=(y>>2&1\|o>>1&2\|y<<1&4\|o<<2&8\|y<<4&16\|o<<5&32)-(wb0[yi800+xi]63+b)/c>>8; tb=(tb>>1)\|(to&128); if (7==(x&7)) { fwrite(&tb,1,1,stdout); tb=0; }` Notice that yi traverses from 599 to 0, depending on x, to prevent a mirror image when rotating our input video from 800x600 landscape mode to kindle 600x800 portrait mode. We are building the OUTPUT video in portait mode, so we need to raster-scan columns from input to output in each frame buffer to build rows of output video. The next statement sets variable o (order) in correspondence with a checkerboard pattern in the framebuffer, representing the output pixel bits needed for a 50-percent grayscale at each resolution step. Then we get a bit complicated, while computing the to variable. The key idea is that some variables are used in the logical expression in a way that enables or disables portions of the overall formula, much like how if-else statements could also be used to do the same thing. The first part of the expression, in parentheses, computes the dither threshold for the current pixel position. Notice how it contains a series of values in a binary progression (&1, &2, &4, &8, &16, &32). These bits are conditionally combined into a 6-bit dither value (0-63). The condition that determines whether these bits remain a shifted one-bit or become a zero-bit is taken from either a shifted y-position or a shifted o value (x xor y), which modified the 50-percent ordered dither mask (in o) to increase or decrease the value depending on its position in the framebuffer. This effectively reproduces the common ordered dither table that is found in most dither routines, without needing a dither table to be precomputed, and without memory lookups (when memory access is very slow compared to real-time computation on modern cached processors). The best way to see how this dither formula works is to fill a table with the values computed by using it, and compare that to a traditional dither table such as in my formula-42 dither logic. The whole point of a dither table is to evenly distribute all values 0 to 63, across a square texture tile, and use that to tile the entire framebuffer. Then each input value is compared against these values, used as a threshold, to determine whether the output pixel is black or white. wb0 is our work-buffer (in-memory version of a framebuffer such as /dev/fb0). wb0[yi800+xi] is the current input pixel, an 8-bit value. Using scaled-integer arithmetic, we do a brightness and contrast adjustment using predefined values for c=250 and b=120 (as determined empirically for good video image quality). Essentially, we multiple the 8-bit pixel by 63, add 120 (default brightness), divide by 250 (default contrast). Instead of comparing each input value against a dither threshold, we subtract the scaled input value from the dither threshold value so that the result is positive or negative depending on threshold result, then we shift the sign bit with >>8 to extend it down into bottom byte of the computation, making the sign-bit either 0 or 255 (pure black or pure white). And last, we mask the sign bit with &128, and shift it into the output byte. And yes, I did cheat a little in that an if statement determines when to write out that output byte. However, there are no branches while computing each pixel bit. The reason for using a logic expression is that traditional if-else statements will run much slower if it confuses CPU cache branch prediction (which is very simplistic in RISC embedded processors). Some compilers may automatically convert if-else statements into more efficient code, however, that cannot be relied on (especially when using tcc while compiling in the kindle itself). I hope that helps understand this code a little bit better. Notice that this only shows the DITHER step, and does not include extra logic in the gmplay program, needed to deal with variations between different kindle models, and between main and diags modes. You may wish to compare and contrast the gmplay code against raw2gmv, or you can wait until I take time to document that a bit better too. However, until then, read the "theory of operation" details by clicking the appropriate button in that the geekmaster kindle video player first post. Last edited by geekmaster; 01-05-2014 at 09:45 PM.*

01-04-2014, 02:43 PM	#2
geekmaster Carpe diem, c'est la vie. Posts: 6,433 Karma: 10773670 Join Date: Nov 2011 Location: Multiverse 6627A Device: K1 to PW3	(reserved for more content) For now, here is my older dither logic that uses a dither table, but does take kindle variations into account: PHP Code: // global vars u8 fb0=NULL; // framebuffer pointer int fdFB=0; // fb0 file descriptor u32 fs=0; // fb0 stride u32 MX=0; // xres (visible) u32 MY=0; // yres (visible) u8 blk=0; // black u8 wht=0; // white u8 pb=0; // pixel bits //======================================== // setpx - draw pixel using ordered dither // x,y: screen coordinates, c: color(0-64). // (This works on all eink kindle models.) //---------------------------------------- void inline setpx(int x,int y,int c) { static int dt[64] = { 1,33,9,41,3,35,11,43,49,17,57,25,51,19,59,27,13,45,5, 37,15,47,7,39,61,29,53,21,63,31,55,23,4,36,12,44,2,34,10,42,52,20,60,28,50, 18,58,26,16,48,8,40,14,46,6,38,64,32,56,24,62,30,54,22 }; // dither table fb0[pbx/8+fsy]=((128&(c-dt[(7&x)+8(7&y)]))/128(blk&(240(1&~x)\| 15(1&x)\|fb0[pbx/8+fsy])))\|((128&(dt[(7&x)+8(7&y)]-c))/128wht\| (blk&((240(1&x)\|15(1&~x))&fb0[pbx/8+fsy]))); // geekmaster formula 42 } I will let you meditate on that until I add more here later... Last edited by geekmaster; 01-04-2014 at 02:56 PM.*

01-04-2014, 02:44 PM	#3
geekmaster Carpe diem, c'est la vie. Posts: 6,433 Karma: 10773670 Join Date: Nov 2011 Location: Multiverse 6627A Device: K1 to PW3	And here is the relevant code that takes the pre-dithered content piped in from raw2gmv or from a pre-dithered video (gmv) file, and converts it to whatever the framebuffer requirements dictate for the kindle models and boot modes supported (all of them, as of the PW). PHP Code: //----- gmlib global vars ----- enum GMLIB_op {GMLIB_INIT,GMLIB_CLOSE,GMLIB_UPDATE,GMLIB_VSYNC}; u8 fb0=NULL; // framebuffer pointer int fdFB=0; // fb0 file descriptor int teu=0; // eink update time u32 fs=0; // fb0 stride u32 MX=0; // xres (visible) u32 MY=0; // yres (visible) u32 VY=0; // (VY>MY): mxcfb driver u8 ppb=0; // pixels per byte u32 fc=0; // frame counter #define FBSIZE (600/8800) //================================== // gmplay4 - play video on 4-bit fb0 //---------------------------------- void gmplay4(void) { u32 i,x,y,b,p,off=(MY/2-400)fs+MX/4-150,fbsize=FBSIZE; u8 fbt[FBSIZE]; while (fread(fbt,fbsize,1,stdin)) { teu+=130; // teu: next update time if (getmsec()>teu+1000) continue; // drop frame if > 1 sec behind gmlib(GMLIB_VSYNC); // wait for fb0 ready for (y=0;y<800;y++) for (x=0;x<600;x+=8) { b=fbt[600/8y+x/8]; i=yfs+x/2+off; p=(b&1)240; b>>=1; fb0[i]=p\|(b&1)15; b>>=1; p=(b&1)240; b>>=1; fb0[i+1]=p\|(b&1)15; b>>=1; p=(b&1)240; b>>=1; fb0[i+2]=p\|(b&1)15; b>>=1; p=(b&1)240; b>>=1; fb0[i+3]=p\|(b&1)15; } fc++; gmlib(GMLIB_UPDATE); } } //================================== // gmplay8 - play video on 8-bit fb0 //---------------------------------- void gmplay8(void) { u32 i,x,y,b,fbsize=FBSIZE; u8 fbt[FBSIZE]; while (fread(fbt,fbsize,1,stdin)) { teu+=130; // teu: next update time if (getmsec()>teu+1000) continue; // drop frame if > 1 sec behind gmlib(GMLIB_VSYNC); // wait for fb0 ready for (y=0;y<800;y++) for (x=0;x<600;x+=8) { b=fbt[600/8y+x/8]; i=yfs+x; fb0[i]=(b&1)255; b>>=1; fb0[i+1]=(b&1)255; b>>=1; fb0[i+2]=(b&1)255; b>>=1; fb0[i+3]=(b&1)255; b>>=1; fb0[i+4]=(b&1)255; b>>=1; fb0[i+5]=(b&1)255; b>>=1; fb0[i+6]=(b&1)255; b>>=1; fb0[i+7]=(b&1)255; } fc++; gmlib(GMLIB_UPDATE); } } I will add more details about how it works here, but again, there are already some details in the relevant posts: https://www.mobileread.com/forums/sho...d.php?t=177455 Last edited by geekmaster; 01-04-2014 at 03:07 PM.*

01-04-2014, 03:51 PM	#4
geekmaster Carpe diem, c'est la vie. Posts: 6,433 Karma: 10773670 Join Date: Nov 2011 Location: Multiverse 6627A Device: K1 to PW3	(reserved for 256-color to 16-color dither from newtrix demo) https://www.mobileread.com/forums/sho...d.php?t=176802 This code in particular will be a good starting point for a "generic" dither routine to be used with any (native mode) app. You feed it a 256-color framebuffer, which does not need to be pre-dithered (unlike real framebuffer content on a K5), and this routine converts it to whatever is needed to be in the real framebuffer of any eink kindle model in any boot mode (probably needs updates to work on PW or PW2). This code uses dither tables, so I will want to convert that to logical formulas as shown in the first post in this thread. Last edited by geekmaster; 01-04-2014 at 03:59 PM.

01-05-2014, 09:55 AM	#7
geekmaster Carpe diem, c'est la vie. Posts: 6,433 Karma: 10773670 Join Date: Nov 2011 Location: Multiverse 6627A Device: K1 to PW3	Here is the wikipedia page on ordered dither, as used here: http://en.wikipedia.org/wiki/Ordered_dithering Here is an 8x8 dither table like we use here, from that wikipedia page: Note that it contains 64 values, from 1 to 64. In order to simplify the comparison into a simple subtract, giving the threshold result in the sign bit, I subtracted one from these dither table values giving 0 to 63. Technically, an ordered dither really gives an extra value when you take zero into account (hence the 1/65 shown to the left of the image), and I prefer to skip a value in the center of the range so I can still have pure black and pure white area fills. In my case, I handle this with the contrast and brightness values to center and stretch the range for best image quality (which can result in more or less bright and dark areas than in the input image, to be adjusted for best eink compatibility as I did for the default values). In this code the dither threshold values are used to determine whether an output pixel will be pure black or pure white. In my 256-color to 16-color dither code as used in the newtrix demo, the threshold values are used to determine when to increment a 16-bit output to the next brighter value to give the appearance of 256 different shades of gray and to prevent "Mach banding" in gradients such as are common in a cloudy sky image. As I mentioned previously, the important things about a dither table are that it contains a single instance of each value in the supported range (0-63 in my case) and that these values are evenly distributed through the table so that an particular grayscale value will consist of evenly distributed black and white pixels in the output image, in a ratio that gives an average value equal to the value of that pixel in the input image. You can think of a dither table as an 8x8 texture image that can be tiled to fill the framebuffer. In fact, modern video graphics cards have shaders that can do dithering by using just such a dithered texture image, corresponding exactly with our dither table. As you can see in the code snippets posted earlier in this thread, I have code that contains a real dither table, and code that computes the table values as-needed where needed using a logical formula. I tested other dither methods, and I decided that a simple ordered dither like this looks the best for video on the kindle eink displays. One of the demos in newtrix also does a spatio-temporal random dither, which has some interesting artifacts causing severe "ringing" that appear as a trail behind moving objects that oscillates between too bright and too dark while it fades, with tail length depending on kindle model and boot mode. Ordered dither is more appealing and more consistent across all the eink kindle models, IMHO. Last edited by geekmaster; 01-05-2014 at 10:16 AM.

01-04-2014, 04:00 PM	#5
geekmaster Carpe diem, c'est la vie. Posts: 6,433 Karma: 10773670 Join Date: Nov 2011 Location: Multiverse 6627A Device: K1 to PW3	Comments, corrections, or requests for more info are welcome. Please post your thoughts on this subject. Thanks.

01-05-2014, 08:29 AM	#6
hawhill Wizard Posts: 1,379 Karma: 2155307 Join Date: Nov 2010 Location: Goettingen, Germany Device: Kindle Paperwhite, Kobo Mini	Thank you very much for this! I'll need to take some time (and probably a few steps back) to digest this, I guess, but it will be well worth it.

01-05-2014, 11:06 AM	#8
NiLuJe BLAM! Posts: 13,506 Karma: 26047202 Join Date: Jun 2010 Location: Paris, France Device: Kindle 2i, 3g, 4, 5w, PW, PW2, PW5; Kobo H2O, Forma, Elipsa, Sage, C2E	I'm pretty sure this was linked from here or IRC, so it's probably not news to anyone, but that looked neat too: http://pippin.gimp.org/a_dither/

01-05-2014, 01:15 PM	#10
geekmaster Carpe diem, c'est la vie. Posts: 6,433 Karma: 10773670 Join Date: Nov 2011 Location: Multiverse 6627A Device: K1 to PW3	Regarding a "karma comment" request (twobob) that we need an "ALL DEVICE" dither wrapper, I agree, but it really needs multiple functions. Besides my setpx() function that sets individual pixels in the dithered screen for highest speed updates of small areas of the screen, I think we need to be able to set 256-color pixels in the work buffer and then call a page update function that dithers the full frame just before calling the eink update function (as can be seen in one of the demos in newtrix). And perhaps a 256-color setpx() type of function that sets 256-color pixels in the work buffer as well. Actually, I had hoped that people could use my dither routines pretty much as-is (black box style) by lifting them out of my demos, but perhaps they do need to be packaged better for more widespread adoption by this community. I have been doing stuff like this for decades, so it seems obvious to me, but we all come from different backgrounds and cultures, so it is not really fair for me to apply such expectations or desires on this community as a whole. I need to make my stuff more accessible, though I really prefer to spend the time it takes to do all the "finishing touches" working on new stuff and new ideas, and leave the repackaging details to others. This thread is an attempt to begin to bridge that gap. Last edited by geekmaster; 01-05-2014 at 01:39 PM.

01-06-2014, 04:59 PM	#13
islewind Junior Member Posts: 2 Karma: 34484 Join Date: Jul 2013 Device: kobo	The computation does not rely on floating point, the patch on the same page for ffmpeg uses only integer arithmetic; the javascript version is merely an interactive play-thing. Showing how to use these short procedural masks both for high bitdepth -> 1bit, as well as -> 4bit or 8bit. Meaning the same formula is fit for driving both high fps 1bit dithers; as well as high fidelity grayscale stills using slower waveforms. "A dither" started out as recreational programming as well; but turnd out interesting enough that I spent too much time optimizing and tuning it's visual characteristics. Fun to find something computationally cheap; that is equvilanet to the much more expensive computation of "void and dither" which creates masks that yield specific green/blue noise frequency distributions (using a, eew, patented algorithm). IIRC on my x86 laptop it was less than ~50% slower than using a LUT. This is sufficient for my current prototype hacks replacing it with a 2d LUT/mask is among low hanging fruits I'll care about at a later stage when/if I care about battery life or need to do more computations per frame. Use it if you want to, (I should probably wipe some dust of the tiny little framebuffer abstraction "library" I am using and put it out there, with a bit of help static binaries I build with the toolchain I use should work on kobo's as well as some kindles.)

01-06-2014, 08:35 PM	#14
twobob ( ͡° ͜ʖ ͡°){ʇlnɐɟ ƃǝs}Týr Posts: 6,586 Karma: 6299993 Join Date: Jun 2012 Location: uti gratia usura (Yao ying da ying; Mo ying da yieng) Device: PW-WIFI\|K5-3G+WIFI\| K4\|K3-3G\|DXG\|K2\| Rooted Nook Touch	Many thanks to everyone for their input on this issue. This work, put in the right place, would open a new world of eink possibilities.

01-07-2014, 01:04 PM	#15
staylo Member Posts: 11 Karma: 34114 Join Date: Mar 2011 Device: PRS-350	Hi geekmaster, I picked up a DXG recently and I'm still trying to get up to speed on the limitations of the device. In a previous post (https://www.mobileread.com/forums/sho...d.php?t=177455) you mentioned that > 1.5fps was possible on the DX / DXG - any further details on what tricks were used to achieve that, and the update rate obtainable? (apologies if this isn't relevant enough to this thread)

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
geekmaster formula 42 (the dithermatron demo)	geekmaster	Kindle Developer's Corner	65	03-17-2017 08:51 AM
Formula Plots	PuxyYunm	General Discussions	9	05-15-2011 04:19 AM
.CSS [CoolReader] Programming Demystified	Dr. Drib	Astak EZReader	6	03-18-2010 01:27 AM
Python Unicode Demystified	ahi	Workshop	2	09-18-2009 12:45 PM

Advert

Advert