Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Readers > Android Devices > enTourage eDGe > enTourage Archive

Notices

 
 
Thread Tools Search this Thread
Old 04-26-2010, 06:17 PM   #1
rakista
Edge User
 
Archive.org can't read any d/led PDF

Was going through some books on early american humanist movements and found I could not read them on the reader, with PDF to go just get endlessly red X's and on the ebook side I get endless blank pages.

What is going on ?
 
Old 04-26-2010, 08:47 PM   #2
aidren
Edge User
 
Rakista

I have a few of these. They are not readable on the eDGe unless they have a hidden text layer. Any that I have with the hidden text layer are fine... with this warning... if you export/transport them out of the library/internal hard drive, the reader software strips the hidden text. So, you have to make sure you keep your original copy elsewhere, and do not overwrite it.

As Boris said, many of the older books are image only. Sometimes it is because they have more than one language represented in them, or inline symbols. I have several like this.

You have to run an ocr on them to create the hidden text layer. You can do this in Acrobat Standard and up, although with the book that I did myself, I found that Acrobat didn't do an adequate job. It seemed to just run everything automatically. I used a scanning software that allowed me to train and check the ocr, as well as export it as a layered pdf. It worked much better than Acrobat. It was a lot of work, but I was interested in keeping the book as resource material.

I'm a Mac user, so the scanning software I used was Read Iris Pro. You'll have to ask the Windows users what scanning software is best. I think it might be Abbey Reader? Also, the ocr quality is somewhat dependent on the image quality. The higher the resolution, the sharper the edges and the ocr has a better chance of interpreting it.

Also, you will not be able to "reflow" the text.
 
Advert
Old 04-26-2010, 08:58 PM   #3
dcubed2
Edge User
 
I can read most of the image pdfs from archive.org, but some are blank. From what I can tell, the ones with additional info (like author) embedded don't work. Also, like Boris said, the ones from Google don't show any images that might be there.

When I open one of the blank pdfs, I usually get the full title (not file name) and author in the top margin of the eInk side. There's no way to tell in advance which files will be blank that I've found.

You can try Project Gutenberg if you don't mind the ocr look. They do epub instead of pdf. Many Google books are also in epub format, all the epub files I've downloaded from both sites have worked. For my books, there are a lot of ocr errors and reading through them is a pain.
 
Old 04-26-2010, 09:44 PM   #4
aidren
Edge User
 
Quote:
...there are a lot of ocr errors and reading through them is a pain.
That is because whoever ocr'd them did it with automatic settings, or within Acrobat or some other such thing.

I have an image pdf from Google, but I ran the ocr myself, set it as a hidden layer, and it is fine... but, like I said, it required some work. Just to be clear, though, it is the image layer you are reading in these.
 
Old 04-26-2010, 09:47 PM   #5
aidren
Edge User
 
Quote:
Originally Posted by borisb View Post
No word from enTourage what if anything they can do via a software update to the eReader.
This was one of the questions I asked tech support. This was the answer
Quote:
Currently exporting a pdf with the text layer intact is in the planning phases but it is not targeted towards an upcoming feature releases, I do expect this capability to appear at some point in the future.
 
Advert
Old 05-04-2010, 07:01 PM   #6
aidren
Edge User
 
Just to add a little more to the blank pdf problem — I was investigating one of these today. I think what may be happening relates to the image formats being used. I believe the eDGe only reads jpeg and png??? I tried to find a way to find out what type of image files were in the pdf I was looking at, but wasn't able to do it. I did find out that there was some type of compression on it that resulted in it not copying to indesign (message said I needed qt to view it because of compression?). So what I did was export a couple of images out of the pdf to jpeg, and rebuilt a page. That page viewed correctly on the eDGe.

I believe most of the automatic ocr setups use tiff because it's not lossy; or they give you a selection, but the default is set to tiff. So, I'm kind of thinking that's the problem. And, in addition, there's whatever compression type has been used. The pdf I was dealing with was output from ghostscript.

Hope it sheds a bit more light.
 
 


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Archive.org opens huge ebook lending library rogue_librarian News 37 02-27-2011 08:16 AM
Archive.org copyright question Hatgirl General Discussions 7 03-23-2010 07:58 PM
Archive.org adds Mobi format for most of 1.8m books Nate the great News 2 12-11-2009 03:01 PM
Copyright of derivative works from archive.org? etienne66 Writers' Corner 22 07-17-2009 08:22 AM


All times are GMT -4. The time now is 02:11 PM.


MobileRead.com is a privately owned, operated and funded community.