View Single Post
Old 10-07-2010, 05:05 AM   #3
luite
Connoisseur
luite has a complete set of Star Wars action figures.luite has a complete set of Star Wars action figures.luite has a complete set of Star Wars action figures.
 
Posts: 82
Karma: 256
Join Date: Feb 2010
Location: Netherlands
Device: dr1000
Hi fekhner, good to see that you like my plugin. Unfortunately I can't use it anymore, since my own DR1000S is broken. I will however update it if the plugin interfaces change.

I bought an iPad to replace the broken DR1000, and I have to agree with your observations. The screen is less readable in many situations, most applications are closed source, and not free (as in beer). While I don't mind paying a few euros for a good application (I bought some), I hate not being able to improve them. Most PDF viewers use Apples PDF renderer, which has some severe limitations (see below), and it's not possible to change this without the source code.

Anyway, I've been working on a DJVU to PDF conversion program that tries to retain the good compression of the DJVU file, without much quality loss. It also converts the text and table of contents (if present).

Making a small PDF file from a DJVU file is a challenge, especially for Apples PDF renderer, and I haven't found any program (free or commercial) that does a good job at this.

First some basic information about DJVU: Every DJVU page can contain a few layers:
- background, typically a medium-resolution color image, contains the paper color and color images.
- mask, contains the data where the foreground layer is visible over the background, typically a high-resolution bitonal (1 bit, black/white) JB2 image containing the shapes of the letters and line drawings
- foreground layer, contains the color of the letters, typically low resolution

See the attached image for an example of how djvu segmentation works.

The key to retaining good compression in the conversion is converting all these layers separately. PDF supports image masks and multiple images per page, so we should be good, right? Unfortunately things are not so good for Apple users, since their PDF renderer has two major limitations:
- It does not support JPEG2000 images (so we have to fall back to the inferior JPEG format for the foreground and background layers)
- Applying a high resolution mask (the mask layer) to a low resolution image (the foreground layer), results in a low resolution image, with in many cases unreadable text.

Fortunately, I found a way around the second issue: If you apply the image mask to a softmask loaded in the graphics state, instead of directly applying it to the image, you can get a high resolution result. There are still some issues with this, the colors in Adobe Reader are slightly different from the original colors. I don't know what causes this, but I want to resolve it before I release the tool and the source code. Perhaps it's some color space or color management issue.

So what my conversion program does for each page is the following:
- convert background layer, if present, from IW44 to JPEG (remove background layer if it's nearly white)
- convert foreground layer from IW44 to JPEG (remove it if it's nearly all black)
- convert mask layer to a high compression JBIG2 image (with the JBIG2 symbol table shared between multiple pages for better compression)
- render the foreground layer over the background layer with the mask applied
- render the hidden text from the OCR layer

Anyway, results, for example this book:
http://www.archive.org/details/happy...ndot00wilduoft
- DJVU file: 6.3MB
- PDF file: 42MB
(I count a MB as 1 million bytes)

The PDF file contains many JPEG2000 images, so all programs that uses Apples PDF renderer on the iPad cannot render them, resulting in many blank pages (this includes GoodReader, iAnnotate PDF)

Output from my program, converted from the DJVU file:
http://94.124.88.78/~luite/happyprin...00wilduoft.pdf
- PDF 10MB

It's not quite as small as the original DJVU file, mainly due to the worse JPEG compression (and I could optimize the rendering of the hidden text layer a bit more), but it is fully iPad compatible (Note: it doesn't set the DPI of the page properly yet, so viewing it at 100% will probably result in huge pages)

Once the color conversion issue is resolved, I'll post the source code here (or maybe in the Apple Devices or PDF forum). I'm also trying to improve the compression and pdf rendering speed by identifying more cases when the foreground layer can be removed (by using multiple JBIG2 images, each for a single color).

I hope other djvu users will find this useful
Attached Thumbnails
Click image for larger version

Name:	djvu-segmentation.png
Views:	604
Size:	38.6 KB
ID:	59483  

Last edited by luite; 10-07-2010 at 05:58 AM.
luite is offline   Reply With Quote