Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Readers > Onyx Boox

Notices

Reply
 
Thread Tools Search this Thread
Old 03-04-2012, 09:02 PM   #1
zuflacht
Junior Member
zuflacht began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Mar 2012
Device: onyx boox m92
PDF compatibility

Hi,

As I am new to the forum: thanks for all the useful information here!

I am using my new Onyx mostly for reading pdf, so I was wondering if someone has an idea why some pdfs don't display on the Onyx like for instance this one: http://www.archive.org/details/suicidestudyinso00durk

Of course, I could sent it through some virtual pdf printer like my dear quarz (Mac user), but if I do, the document gains some 600GB against its original 10MB ... And no, I rather have the page layout than use an ebook format, that was the reason I got an M92 ...

Thanks for any help!
Best,
Johan
zuflacht is offline   Reply With Quote
Old 03-04-2012, 11:43 PM   #2
pidgeon92
Wizard
pidgeon92 ought to be getting tired of karma fortunes by now.pidgeon92 ought to be getting tired of karma fortunes by now.pidgeon92 ought to be getting tired of karma fortunes by now.pidgeon92 ought to be getting tired of karma fortunes by now.pidgeon92 ought to be getting tired of karma fortunes by now.pidgeon92 ought to be getting tired of karma fortunes by now.pidgeon92 ought to be getting tired of karma fortunes by now.pidgeon92 ought to be getting tired of karma fortunes by now.pidgeon92 ought to be getting tired of karma fortunes by now.pidgeon92 ought to be getting tired of karma fortunes by now.pidgeon92 ought to be getting tired of karma fortunes by now.
 
pidgeon92's Avatar
 
Posts: 2,771
Karma: 6230000
Join Date: Jun 2008
Location: Chicago, IL
Device: Kindle PW2, Kindle Voyage, Kindle DXG, Boox M90, Kobo Aura HD
Try opening the PDF in Preview, and then saving it as a PDF. It shouldn't add any file size, and when you add the new file to the Boox, it should open correctly.
pidgeon92 is offline   Reply With Quote
Old 03-05-2012, 04:32 AM   #3
zuflacht
Junior Member
zuflacht began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Mar 2012
Device: onyx boox m92
Thanks, but unfortunately, even if it shouldn't change the file size, it does: instead of 14 MB, I get 899 MB ...
zuflacht is offline   Reply With Quote
Old 03-05-2012, 09:24 AM   #4
PF4Mobile
Guru
PF4Mobile can teach chickens to fly.PF4Mobile can teach chickens to fly.PF4Mobile can teach chickens to fly.PF4Mobile can teach chickens to fly.PF4Mobile can teach chickens to fly.PF4Mobile can teach chickens to fly.PF4Mobile can teach chickens to fly.PF4Mobile can teach chickens to fly.PF4Mobile can teach chickens to fly.PF4Mobile can teach chickens to fly.PF4Mobile can teach chickens to fly.
 
Posts: 621
Karma: 3526
Join Date: Jun 2011
Device: Kobo Touch, Nook Touch, EEE 800 Note, Entourage PE, finally M92
wow

that is a really bad pdf
I tried a couple of tricks on it nothing worked
I had to give up (no time for now) ...will try again later
PF4Mobile is offline   Reply With Quote
Old 03-05-2012, 10:08 AM   #5
tuxor
Addict
tuxor is kind to children and small, furry animalstuxor is kind to children and small, furry animalstuxor is kind to children and small, furry animalstuxor is kind to children and small, furry animalstuxor is kind to children and small, furry animalstuxor is kind to children and small, furry animalstuxor is kind to children and small, furry animalstuxor is kind to children and small, furry animalstuxor is kind to children and small, furry animalstuxor is kind to children and small, furry animalstuxor is kind to children and small, furry animals
 
Posts: 314
Karma: 6809
Join Date: Oct 2011
Location: Germany
Device: Onyx Boox M92, Icarus Illumina E653
This pdf consists of pictures with multiple layers. One layer for the text (~800KB per page if saved directly from Evince as png or jpg in decent quality), one for the background (and two additional layers I don't understand) - really good work has been done here in extracting the text layer after scanning - I wonder whether you could even remove the yellowish background layer without losing any quality of reading/information? Then of course you have the plain text information from OCR which doesn't amount to a significant part of the file size.

By the way, showing this pdf in Evince is really slow on my notebook with 2.4 GHz Core2Duo with 4GB RAM - so I'm not surprised it's kind of a challenge for the M92. Printing it with cups-pdf is slow and returns a file of appx 1GB (only printed the first 20 pages for testing) that doesn't really contain what you'd expect.

Extracting all images with command "pdfimages" yields 3 ppm files and 1 pbm file (image format with only two different colors) per page. All images together amount to more than 8GB (I estimate). If you only keep the pbm files, which contain the text information in appx 2000x3000 pixels, it's about 320 MB. Convert those pbm files to png and you have appx 30 KB per page, so all in all 30*400=12000KB=12MB for the text layer in the whole PDF extracted as PNG.

Last edited by tuxor; 03-05-2012 at 10:10 AM.
tuxor is offline   Reply With Quote
Old 03-05-2012, 10:43 AM   #6
PF4Mobile
Guru
PF4Mobile can teach chickens to fly.PF4Mobile can teach chickens to fly.PF4Mobile can teach chickens to fly.PF4Mobile can teach chickens to fly.PF4Mobile can teach chickens to fly.PF4Mobile can teach chickens to fly.PF4Mobile can teach chickens to fly.PF4Mobile can teach chickens to fly.PF4Mobile can teach chickens to fly.PF4Mobile can teach chickens to fly.PF4Mobile can teach chickens to fly.
 
Posts: 621
Karma: 3526
Join Date: Jun 2011
Device: Kobo Touch, Nook Touch, EEE 800 Note, Entourage PE, finally M92
Adobe Acrobat doesn't see any layers there..are you sure those are layers?
I do see Objects overlayed in the Content pane

Edit: I tried to delete the image object underlying the text and the text disappeared.
the text object was there but there is something wrong with the font (not embedded?) or with the text cassette visibility ...it beats me what it is.

If you do not plan to copy text from this document just find the version without OCR
Actually M92 seems to have a problem with the image layers since all the pages seemed to be blank. Now I realize that the text must had been there but I could not see it.

The other way to solve the problem (if you insist to read the file in PDF) is to get the epub file from the same page and to transform into a PDF with calibre or something else

Last edited by PF4Mobile; 03-05-2012 at 10:51 AM.
PF4Mobile is offline   Reply With Quote
Old 03-05-2012, 10:59 AM   #7
tuxor
Addict
tuxor is kind to children and small, furry animalstuxor is kind to children and small, furry animalstuxor is kind to children and small, furry animalstuxor is kind to children and small, furry animalstuxor is kind to children and small, furry animalstuxor is kind to children and small, furry animalstuxor is kind to children and small, furry animalstuxor is kind to children and small, furry animalstuxor is kind to children and small, furry animalstuxor is kind to children and small, furry animalstuxor is kind to children and small, furry animals
 
Posts: 314
Karma: 6809
Join Date: Oct 2011
Location: Germany
Device: Onyx Boox M92, Icarus Illumina E653
Well, I don't even have Adobe Acrobat - I don't need it, it's too expensive and it doesn't run on Linux... ;-) I was just looking at what the command "pdfimages" returned and what I got when exporting images from inside the document with evince.

Unfortunately there are many pages with annotations in that document. They amount for more than 150KB each when exported as png. So unfortunately that's more than 60MB in the end when exported as png :-/
tuxor is offline   Reply With Quote
Old 03-05-2012, 11:08 AM   #8
PF4Mobile
Guru
PF4Mobile can teach chickens to fly.PF4Mobile can teach chickens to fly.PF4Mobile can teach chickens to fly.PF4Mobile can teach chickens to fly.PF4Mobile can teach chickens to fly.PF4Mobile can teach chickens to fly.PF4Mobile can teach chickens to fly.PF4Mobile can teach chickens to fly.PF4Mobile can teach chickens to fly.PF4Mobile can teach chickens to fly.PF4Mobile can teach chickens to fly.
 
Posts: 621
Karma: 3526
Join Date: Jun 2011
Device: Kobo Touch, Nook Touch, EEE 800 Note, Entourage PE, finally M92
those commands seem to be misleading since the layers you mentioned seem not to be there. That unless Adobe Acrobat is wrong.
Other PDF viewers that I tried do not seem to see that either
PF4Mobile is offline   Reply With Quote
Old 03-05-2012, 11:42 AM   #9
Booxtor
Booxtor
Booxtor juggles neatly with hedgehogs.Booxtor juggles neatly with hedgehogs.Booxtor juggles neatly with hedgehogs.Booxtor juggles neatly with hedgehogs.Booxtor juggles neatly with hedgehogs.Booxtor juggles neatly with hedgehogs.Booxtor juggles neatly with hedgehogs.Booxtor juggles neatly with hedgehogs.Booxtor juggles neatly with hedgehogs.Booxtor juggles neatly with hedgehogs.Booxtor juggles neatly with hedgehogs.
 
Booxtor's Avatar
 
Posts: 897
Karma: 69628
Join Date: Jun 2011
Location: Germany
Device: a lot of..
I have tried to open that PDF document on all my PDF supporting ereaders (Pocketbook 903, Sony PRS650) they don't display this file properly either. It must be something special with the PDFs from those archive pages
Booxtor is offline   Reply With Quote
Old 03-05-2012, 11:46 AM   #10
tuxor
Addict
tuxor is kind to children and small, furry animalstuxor is kind to children and small, furry animalstuxor is kind to children and small, furry animalstuxor is kind to children and small, furry animalstuxor is kind to children and small, furry animalstuxor is kind to children and small, furry animalstuxor is kind to children and small, furry animalstuxor is kind to children and small, furry animalstuxor is kind to children and small, furry animalstuxor is kind to children and small, furry animalstuxor is kind to children and small, furry animals
 
Posts: 314
Karma: 6809
Join Date: Oct 2011
Location: Germany
Device: Onyx Boox M92, Icarus Illumina E653
What I wanted to say is, that I have no idea of the whole pdf format at all. I don't know whether there are "layers" or anything like that at all in the pdf format. I was just playing around with some pdf tools and looking at the result...

However: maybe zuflacht can try this pdf on his M92, it's the book from the first post in a slightly different format (only first 30 pages and in png) and there's a small chance it might be displayed correctly on the M92:output.pdf

Last edited by tuxor; 03-05-2012 at 11:55 AM.
tuxor is offline   Reply With Quote
Old 03-05-2012, 01:08 PM   #11
Beryll Snyder
Banned
Beryll Snyder with a running start, can leap into geosynchronous orbitBeryll Snyder with a running start, can leap into geosynchronous orbitBeryll Snyder with a running start, can leap into geosynchronous orbitBeryll Snyder with a running start, can leap into geosynchronous orbitBeryll Snyder with a running start, can leap into geosynchronous orbitBeryll Snyder with a running start, can leap into geosynchronous orbitBeryll Snyder with a running start, can leap into geosynchronous orbitBeryll Snyder with a running start, can leap into geosynchronous orbitBeryll Snyder with a running start, can leap into geosynchronous orbitBeryll Snyder with a running start, can leap into geosynchronous orbitBeryll Snyder with a running start, can leap into geosynchronous orbit
 
Posts: 356
Karma: 60546
Join Date: Oct 2010
Device: Nook classic, PB 903, Onyx M92
It displays alright on my Nook classic, without the annotation and maps.
Funny formating though and a hodgepodge of fonts.
Beryll Snyder is offline   Reply With Quote
Old 03-05-2012, 04:40 PM   #12
zuflacht
Junior Member
zuflacht began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Mar 2012
Device: onyx boox m92
Thanks everyone, particular tuxor and eLiNK (by private msg.), those files work fine! It seems the png-version from tuxor has better contrast ...
I had the chance to check this pdf in Adobe Acrobat, it reported two images per page, one is the scan, the other has the "interpolate flag" set, so this is probably where the problem is. Would anyone know how to get rid of all those extra images (they also have smaller res) besides exporting and reimporting, i.e., some kind of batch process of preflight fix?
Thanks again!
zuflacht is offline   Reply With Quote
Old 03-05-2012, 05:21 PM   #13
tuxor
Addict
tuxor is kind to children and small, furry animalstuxor is kind to children and small, furry animalstuxor is kind to children and small, furry animalstuxor is kind to children and small, furry animalstuxor is kind to children and small, furry animalstuxor is kind to children and small, furry animalstuxor is kind to children and small, furry animalstuxor is kind to children and small, furry animalstuxor is kind to children and small, furry animalstuxor is kind to children and small, furry animalstuxor is kind to children and small, furry animals
 
Posts: 314
Karma: 6809
Join Date: Oct 2011
Location: Germany
Device: Onyx Boox M92, Icarus Illumina E653
Okay, since the way I did it seems to work, I will also contribute the small bash script that I wrote to get the png-pdf-version:
Code:
#!/bin/bash
for i in {1..416}
do
   j=$(printf %03d $i)
   pdfimages -j -f $i -l $i $1 __tmpfile
   rm -f __tmpfile*.ppm
   convert -negate __tmpfile*.pbm __tmpimg$j.png
   rm -f __tmpfile*.pbm
   convert __tmpimg$j.png __tmpimg$j.pdf
   rm -f __tmpimg*.png
done
pdftk __tmpimg*.pdf cat output output.pdf
rm -f __tmpimg*.pdf
This script needs the path to the input pdf as argument and will write to "output.pdf" in the working directory. The final pdf will be appx 54MB and the procedure will take really long and use a lot of cpu power. The same script probably won't work with most other pdfs, but there's a good chance it will work with some of the pdfs on archive.org that stem from the same ocr software.

Unfortunately, if you are on windows, there is no way of using this script. But I uploaded the whole converted file and will send the link via pm on request.
tuxor is offline   Reply With Quote
Old 03-06-2012, 03:47 AM   #14
FDD
Connoisseur
FDD can extract oil from cheeseFDD can extract oil from cheeseFDD can extract oil from cheeseFDD can extract oil from cheeseFDD can extract oil from cheeseFDD can extract oil from cheeseFDD can extract oil from cheeseFDD can extract oil from cheeseFDD can extract oil from cheese
 
Posts: 62
Karma: 1114
Join Date: Jan 2012
Device: Onyx Boox M92
Did anybody try the DjVu version of the file? It usually works better than PDF for scanned documents.
FDD is offline   Reply With Quote
Old 03-06-2012, 04:20 AM   #15
Beryll Snyder
Banned
Beryll Snyder with a running start, can leap into geosynchronous orbitBeryll Snyder with a running start, can leap into geosynchronous orbitBeryll Snyder with a running start, can leap into geosynchronous orbitBeryll Snyder with a running start, can leap into geosynchronous orbitBeryll Snyder with a running start, can leap into geosynchronous orbitBeryll Snyder with a running start, can leap into geosynchronous orbitBeryll Snyder with a running start, can leap into geosynchronous orbitBeryll Snyder with a running start, can leap into geosynchronous orbitBeryll Snyder with a running start, can leap into geosynchronous orbitBeryll Snyder with a running start, can leap into geosynchronous orbitBeryll Snyder with a running start, can leap into geosynchronous orbit
 
Posts: 356
Karma: 60546
Join Date: Oct 2010
Device: Nook classic, PB 903, Onyx M92
Quote:
Originally Posted by FDD View Post
Did anybody try the DjVu version of the file? It usually works better than PDF for scanned documents.
In a scientific context you need page numbers for quoting etc. ...
Beryll Snyder is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Booken compatibility.... carpetmojo Bookeen 4 12-26-2011 02:43 PM
Nook Color compatibility with PDF magazines SteveV Nook Color & Nook Tablet 8 01-25-2011 05:53 AM
Database compatibility mwheinz Calibre 5 11-08-2010 10:44 AM
Mobipocket compatibility ckirchho ePub 7 03-28-2009 12:26 PM
Compatibility? Egghead Sony Reader 4 06-16-2006 07:01 PM


All times are GMT -4. The time now is 09:19 AM.


MobileRead.com is a privately owned, operated and funded community.