Thread: Bug report
View Single Post
Old 01-17-2011, 05:30 PM   #136
review
Addict
review got an A in P-Chem.review got an A in P-Chem.review got an A in P-Chem.review got an A in P-Chem.review got an A in P-Chem.review got an A in P-Chem.review got an A in P-Chem.review got an A in P-Chem.review got an A in P-Chem.review got an A in P-Chem.review got an A in P-Chem.
 
Posts: 315
Karma: 6448
Join Date: Nov 2010
Device: 903
Quote:
Originally Posted by teofrast View Post
hallo GeoffC,

review,
thank you for your kind and detailed answer. I discovered djvu just one week ago when, after years of intensive reading on lcd, I decided I had to find something easier on my eyes, and started considering ereaders. it seems a very clever format, and, when possible, I'll definitedly move to it from now on. problem is my existing library, as I wouldn't like to waste reading time in converting gigabytes of data. the more, my area of interest is very far from mainstream literature, so it's mostly simple scan with no editing and quite hard to make it undergo OCR. is there a way to edit the problematic colour pdf scans and make them b/w?
as new pocketbook customer, I wholeheartedly support your request to pocketbook to fix the FW and make it fully compliant to the formats declared in the advertising. I bought pb902 solely to read pdf and djvu from gallica, googlebooks and archive.org, assuming from their statement that those formats were supported.
ok, I'm not entirely sure if I get you right and if my points were very clear. So I try to rephrase some points:

It is very difficult to make general statements. Accordingly I wouldn't go so far to claim that a book stored in the djvu format is always the better choice than the one in pdf. It depends strongly on the properties how they were scanned. I have many pdf files from archive.org which are excellent to read. You can see this very often in the preview on the right. The clear white frames usually indicate that the background was already removed. And those files I download usually as pdf only. If, however the preview on archive.org indicates this brownish colour then I usually download the pdf, b/w-pdf and djvu if available. Then I upload all of them to the reader and decide there which one is the best.

You ask if there is a way to make the djvu files bw. Well, this is exactly what I ment with the script I wrote. You simply point my program to the djvu files it should enhance. Then you can go get a coffee and when you come back it transformed the djvu file to be much better to read on the device. So yes, it is possible and even for most files with a great success.

I'm sure one can write a similar program to transform coloured pdfs. One would probably convert all files to postscript and then use ghostscript to enhance the single page. Maybe by setting a threshold for what is background and what is foreground. But all this is unnecessary if you use the djvu format as it has been done for you already. And my script doesn't really process the file itself it just separates the layers and rebundles them again. This means that this is also much faster than if you need to run extensive format conversations on each file. My script takes about one second to process a single page (and if you keep in mind that this is on my old laptop with 600 MHz and little RAM you can imagine that you can easily convert many gigabytes of files very fast on a more decent computer).

Coming to OCR: almost all files on archive.org (pdf or djvu) are already OCRŽd. So you don't need to worry about OCR at all.
review is offline   Reply With Quote