04-26-2010, 06:17 PM | #1 |
Edge User
|
Archive.org can't read any d/led PDF
Was going through some books on early american humanist movements and found I could not read them on the reader, with PDF to go just get endlessly red X's and on the ebook side I get endless blank pages.
What is going on ? |
04-26-2010, 08:47 PM | #2 |
Edge User
|
Rakista
I have a few of these. They are not readable on the eDGe unless they have a hidden text layer. Any that I have with the hidden text layer are fine... with this warning... if you export/transport them out of the library/internal hard drive, the reader software strips the hidden text. So, you have to make sure you keep your original copy elsewhere, and do not overwrite it. As Boris said, many of the older books are image only. Sometimes it is because they have more than one language represented in them, or inline symbols. I have several like this. You have to run an ocr on them to create the hidden text layer. You can do this in Acrobat Standard and up, although with the book that I did myself, I found that Acrobat didn't do an adequate job. It seemed to just run everything automatically. I used a scanning software that allowed me to train and check the ocr, as well as export it as a layered pdf. It worked much better than Acrobat. It was a lot of work, but I was interested in keeping the book as resource material. I'm a Mac user, so the scanning software I used was Read Iris Pro. You'll have to ask the Windows users what scanning software is best. I think it might be Abbey Reader? Also, the ocr quality is somewhat dependent on the image quality. The higher the resolution, the sharper the edges and the ocr has a better chance of interpreting it. Also, you will not be able to "reflow" the text. |
Advert | |
|
04-26-2010, 08:58 PM | #3 |
Edge User
|
I can read most of the image pdfs from archive.org, but some are blank. From what I can tell, the ones with additional info (like author) embedded don't work. Also, like Boris said, the ones from Google don't show any images that might be there.
When I open one of the blank pdfs, I usually get the full title (not file name) and author in the top margin of the eInk side. There's no way to tell in advance which files will be blank that I've found. You can try Project Gutenberg if you don't mind the ocr look. They do epub instead of pdf. Many Google books are also in epub format, all the epub files I've downloaded from both sites have worked. For my books, there are a lot of ocr errors and reading through them is a pain. |
04-26-2010, 09:44 PM | #4 | |
Edge User
|
Quote:
I have an image pdf from Google, but I ran the ocr myself, set it as a hidden layer, and it is fine... but, like I said, it required some work. Just to be clear, though, it is the image layer you are reading in these. |
|
04-26-2010, 09:47 PM | #5 | ||
Edge User
|
Quote:
Quote:
|
||
Advert | |
|
05-04-2010, 07:01 PM | #6 |
Edge User
|
Just to add a little more to the blank pdf problem — I was investigating one of these today. I think what may be happening relates to the image formats being used. I believe the eDGe only reads jpeg and png??? I tried to find a way to find out what type of image files were in the pdf I was looking at, but wasn't able to do it. I did find out that there was some type of compression on it that resulted in it not copying to indesign (message said I needed qt to view it because of compression?). So what I did was export a couple of images out of the pdf to jpeg, and rebuilt a page. That page viewed correctly on the eDGe.
I believe most of the automatic ocr setups use tiff because it's not lossy; or they give you a selection, but the default is set to tiff. So, I'm kind of thinking that's the problem. And, in addition, there's whatever compression type has been used. The pdf I was dealing with was output from ghostscript. Hope it sheds a bit more light. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Archive.org opens huge ebook lending library | rogue_librarian | News | 37 | 02-27-2011 08:16 AM |
Archive.org copyright question | Hatgirl | General Discussions | 7 | 03-23-2010 07:58 PM |
Archive.org adds Mobi format for most of 1.8m books | Nate the great | News | 2 | 12-11-2009 03:01 PM |
Copyright of derivative works from archive.org? | etienne66 | Writers' Corner | 22 | 07-17-2009 08:22 AM |