03-04-2012, 08:02 PM | #1 |
Junior Member
Posts: 5
Karma: 10
Join Date: Mar 2012
Device: onyx boox m92
|
PDF compatibility
Hi,
As I am new to the forum: thanks for all the useful information here! I am using my new Onyx mostly for reading pdf, so I was wondering if someone has an idea why some pdfs don't display on the Onyx like for instance this one: http://www.archive.org/details/suicidestudyinso00durk Of course, I could sent it through some virtual pdf printer like my dear quarz (Mac user), but if I do, the document gains some 600GB against its original 10MB ... And no, I rather have the page layout than use an ebook format, that was the reason I got an M92 ... Thanks for any help! Best, Johan |
03-04-2012, 10:43 PM | #2 |
Wizard
Posts: 3,144
Karma: 8426142
Join Date: Jun 2008
Location: Chicago, IL
Device: Kindle PW2, Kindle Voyage, Kindle DXG, Boox M90, Kobo Aura HD
|
Try opening the PDF in Preview, and then saving it as a PDF. It shouldn't add any file size, and when you add the new file to the Boox, it should open correctly.
|
Advert | |
|
03-05-2012, 03:32 AM | #3 |
Junior Member
Posts: 5
Karma: 10
Join Date: Mar 2012
Device: onyx boox m92
|
Thanks, but unfortunately, even if it shouldn't change the file size, it does: instead of 14 MB, I get 899 MB ...
|
03-05-2012, 08:24 AM | #4 |
Guru
Posts: 629
Karma: 3526
Join Date: Jun 2011
Device: Kobo Touch, Nook Touch, EEE 800 Note, Entourage PE, finally M92
|
wow
that is a really bad pdf I tried a couple of tricks on it nothing worked I had to give up (no time for now) ...will try again later |
03-05-2012, 09:08 AM | #5 |
Addict
Posts: 320
Karma: 99999
Join Date: Oct 2011
Location: Germany
Device: Onyx Boox M92, Icarus Illumina E653
|
This pdf consists of pictures with multiple layers. One layer for the text (~800KB per page if saved directly from Evince as png or jpg in decent quality), one for the background (and two additional layers I don't understand) - really good work has been done here in extracting the text layer after scanning - I wonder whether you could even remove the yellowish background layer without losing any quality of reading/information? Then of course you have the plain text information from OCR which doesn't amount to a significant part of the file size.
By the way, showing this pdf in Evince is really slow on my notebook with 2.4 GHz Core2Duo with 4GB RAM - so I'm not surprised it's kind of a challenge for the M92. Printing it with cups-pdf is slow and returns a file of appx 1GB (only printed the first 20 pages for testing) that doesn't really contain what you'd expect. Extracting all images with command "pdfimages" yields 3 ppm files and 1 pbm file (image format with only two different colors) per page. All images together amount to more than 8GB (I estimate). If you only keep the pbm files, which contain the text information in appx 2000x3000 pixels, it's about 320 MB. Convert those pbm files to png and you have appx 30 KB per page, so all in all 30*400=12000KB=12MB for the text layer in the whole PDF extracted as PNG. Last edited by tuxor; 03-05-2012 at 09:10 AM. |
Advert | |
|
03-05-2012, 09:43 AM | #6 |
Guru
Posts: 629
Karma: 3526
Join Date: Jun 2011
Device: Kobo Touch, Nook Touch, EEE 800 Note, Entourage PE, finally M92
|
Adobe Acrobat doesn't see any layers there..are you sure those are layers?
I do see Objects overlayed in the Content pane Edit: I tried to delete the image object underlying the text and the text disappeared. the text object was there but there is something wrong with the font (not embedded?) or with the text cassette visibility ...it beats me what it is. If you do not plan to copy text from this document just find the version without OCR Actually M92 seems to have a problem with the image layers since all the pages seemed to be blank. Now I realize that the text must had been there but I could not see it. The other way to solve the problem (if you insist to read the file in PDF) is to get the epub file from the same page and to transform into a PDF with calibre or something else Last edited by PF4Mobile; 03-05-2012 at 09:51 AM. |
03-05-2012, 09:59 AM | #7 |
Addict
Posts: 320
Karma: 99999
Join Date: Oct 2011
Location: Germany
Device: Onyx Boox M92, Icarus Illumina E653
|
Well, I don't even have Adobe Acrobat - I don't need it, it's too expensive and it doesn't run on Linux... ;-) I was just looking at what the command "pdfimages" returned and what I got when exporting images from inside the document with evince.
Unfortunately there are many pages with annotations in that document. They amount for more than 150KB each when exported as png. So unfortunately that's more than 60MB in the end when exported as png :-/ |
03-05-2012, 10:08 AM | #8 |
Guru
Posts: 629
Karma: 3526
Join Date: Jun 2011
Device: Kobo Touch, Nook Touch, EEE 800 Note, Entourage PE, finally M92
|
those commands seem to be misleading since the layers you mentioned seem not to be there. That unless Adobe Acrobat is wrong.
Other PDF viewers that I tried do not seem to see that either |
03-05-2012, 10:42 AM | #9 |
Booxtor
Posts: 1,126
Karma: 2305664
Join Date: Jun 2011
Location: Germany
Device: a lot of..
|
I have tried to open that PDF document on all my PDF supporting ereaders (Pocketbook 903, Sony PRS650) they don't display this file properly either. It must be something special with the PDFs from those archive pages
|
03-05-2012, 10:46 AM | #10 |
Addict
Posts: 320
Karma: 99999
Join Date: Oct 2011
Location: Germany
Device: Onyx Boox M92, Icarus Illumina E653
|
What I wanted to say is, that I have no idea of the whole pdf format at all. I don't know whether there are "layers" or anything like that at all in the pdf format. I was just playing around with some pdf tools and looking at the result...
However: maybe zuflacht can try this pdf on his M92, it's the book from the first post in a slightly different format (only first 30 pages and in png) and there's a small chance it might be displayed correctly on the M92:output.pdf Last edited by tuxor; 03-05-2012 at 10:55 AM. |
03-05-2012, 12:08 PM | #11 |
Banned
Posts: 356
Karma: 60546
Join Date: Oct 2010
Device: Nook classic, PB 903, Onyx M92
|
It displays alright on my Nook classic, without the annotation and maps.
Funny formating though and a hodgepodge of fonts. |
03-05-2012, 03:40 PM | #12 |
Junior Member
Posts: 5
Karma: 10
Join Date: Mar 2012
Device: onyx boox m92
|
Thanks everyone, particular tuxor and eLiNK (by private msg.), those files work fine! It seems the png-version from tuxor has better contrast ...
I had the chance to check this pdf in Adobe Acrobat, it reported two images per page, one is the scan, the other has the "interpolate flag" set, so this is probably where the problem is. Would anyone know how to get rid of all those extra images (they also have smaller res) besides exporting and reimporting, i.e., some kind of batch process of preflight fix? Thanks again! |
03-05-2012, 04:21 PM | #13 |
Addict
Posts: 320
Karma: 99999
Join Date: Oct 2011
Location: Germany
Device: Onyx Boox M92, Icarus Illumina E653
|
Okay, since the way I did it seems to work, I will also contribute the small bash script that I wrote to get the png-pdf-version:
Code:
#!/bin/bash for i in {1..416} do j=$(printf %03d $i) pdfimages -j -f $i -l $i $1 __tmpfile rm -f __tmpfile*.ppm convert -negate __tmpfile*.pbm __tmpimg$j.png rm -f __tmpfile*.pbm convert __tmpimg$j.png __tmpimg$j.pdf rm -f __tmpimg*.png done pdftk __tmpimg*.pdf cat output output.pdf rm -f __tmpimg*.pdf Unfortunately, if you are on windows, there is no way of using this script. But I uploaded the whole converted file and will send the link via pm on request. |
03-06-2012, 02:47 AM | #14 |
Connoisseur
Posts: 62
Karma: 1114
Join Date: Jan 2012
Device: Onyx Boox M92
|
Did anybody try the DjVu version of the file? It usually works better than PDF for scanned documents.
|
03-06-2012, 03:20 AM | #15 |
Banned
Posts: 356
Karma: 60546
Join Date: Oct 2010
Device: Nook classic, PB 903, Onyx M92
|
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Booken compatibility.... | carpetmojo | Bookeen | 4 | 12-26-2011 01:43 PM |
Nook Color compatibility with PDF magazines | SteveV | Nook Color & Nook Tablet | 8 | 01-25-2011 04:53 AM |
Database compatibility | mwheinz | Calibre | 5 | 11-08-2010 09:44 AM |
Mobipocket compatibility | ckirchho | ePub | 7 | 03-28-2009 11:26 AM |
Compatibility? | Egghead | Sony Reader | 4 | 06-16-2006 06:01 PM |