10-10-2008, 02:27 PM | #1 |
Connoisseur
Posts: 86
Karma: 10
Join Date: Nov 2007
Device: Irex Illiad, Sony 505
|
Ok I have scanned pdf books....but
each page is a scanned image. So the text is very small. When I try to increase the font....nothing happens. Changing pages takes forever and I dont know why.
Is there a way to change this book from pdf scanned images to a normal LRF file? Would that make it look like a normal lrf book? Thanks |
10-10-2008, 03:50 PM | #2 |
Zealot
Posts: 103
Karma: 148
Join Date: Aug 2008
Location: Huntington, IN US
Device: Sony PRS-505
|
You will need some kind of OCR (Optical Character Recognition) software. There are a few freeware versions if you Google them.
|
Advert | |
|
10-10-2008, 04:48 PM | #3 |
Connoisseur
Posts: 86
Karma: 10
Join Date: Nov 2007
Device: Irex Illiad, Sony 505
|
|
10-10-2008, 05:12 PM | #4 |
Reader
Posts: 11,504
Karma: 8720163
Join Date: May 2007
Location: South Wales, UK
Device: Sony PRS-500, PRS-505, Asus EEEpc 4G
|
No, OCR screws up the text in perfectly normal books too. I expect to spend about 10-15 hours unscrewing a straightforward novel, depending on its length, and on whether there are a lot of italics or emdashes to reinstate.
|
10-10-2008, 05:21 PM | #5 |
Connoisseur
Posts: 86
Karma: 10
Join Date: Nov 2007
Device: Irex Illiad, Sony 505
|
Having to go through the whole 400 page book looking for mistakes almost does not make it worth it.
|
Advert | |
|
10-10-2008, 05:27 PM | #6 | |
New York Editor
Posts: 6,384
Karma: 16540415
Join Date: Aug 2007
Device: PalmTX, Pocket eDGe, Alcatel Fierce 4, RCA Viking Pro 10, Nexus 7
|
Quote:
Once you've done that, you can see about converting the result to a supported ebook format. ______ Dennis |
|
10-10-2008, 07:10 PM | #7 |
Zealot
Posts: 103
Karma: 148
Join Date: Aug 2008
Location: Huntington, IN US
Device: Sony PRS-505
|
What software are you using? I have had pretty decent luck with Adobe Acrobat's OCR system. The biggest issue with OCR is to get a clean and straight scan. You want the resolution set to about 250 - 300 dpi. and you want the final output to be a bitmap tiff. Not grayscale or RGB/CMYK. You definitely don't want it as a jpg.
|
10-10-2008, 07:51 PM | #8 |
Tech Junkie
Posts: 1,027
Karma: 10080
Join Date: Aug 2007
Location: Earth
Device: iPad, MotoXStyle, OnePlusOne
|
I'll say that Adobe's OCR is basic and while it does a decent job, in no where near that accurate. It does ok with Normal office documents and stuff, where the language is plain, it's not that good when you throw in a lot of strange / complex words or layouts into the mix.
I've had much better results from Abby Finreader 9 Pro when I got the chance to use it. It was able to identify words, diagrams, etc with relative ease and was able to deal with foreign language words and accents much better. The drawback is that its not cheap, and it does take a while to process and is quiet system heavy. |
10-10-2008, 07:58 PM | #9 |
Guru
Posts: 860
Karma: 4380
Join Date: Feb 2008
Location: Almada, Portugal
Device: Cybook Gen3, Sony PRS 505, Kindle DXG and Samsung Galaxy Note
|
The leader OCR applications are:
Finereader Pro 9.0 Omnipage Pro 16 And we are talking about more then 99% accuracy... |
10-10-2008, 08:09 PM | #10 |
Connoisseur
Posts: 86
Karma: 10
Join Date: Nov 2007
Device: Irex Illiad, Sony 505
|
Thanks for the info....I am going to try the finereader. We have it at work so getting it is not an issue.
|
10-10-2008, 08:14 PM | #11 |
Resident Curmudgeon
Posts: 75,851
Karma: 134368292
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
It probably isn't worth the effort. Would be better to just get the book as a legal eBook (if it exists and while Fictionwise and BooksOnBoard are both having 50% off sales) or a pBook.
|
10-10-2008, 08:35 PM | #12 | |
Tech Junkie
Posts: 1,027
Karma: 10080
Join Date: Aug 2007
Location: Earth
Device: iPad, MotoXStyle, OnePlusOne
|
Quote:
Still, If its a Star Wars book, it should probably be available, unless its a really old one. |
|
10-10-2008, 08:56 PM | #13 |
Connoisseur
Posts: 86
Karma: 10
Join Date: Nov 2007
Device: Irex Illiad, Sony 505
|
Well I do legal own the book, I scanned it in myself. There is no third party involved.
It is actually new Star Wars Book, Order 66 by Karen Traviss. Surprisingly it is not on mobipocket, fictionwise, or the sony store. |
10-10-2008, 09:12 PM | #14 |
Connoisseur
Posts: 86
Karma: 10
Join Date: Nov 2007
Device: Irex Illiad, Sony 505
|
Ok, with Finereader I have the option to save it in MANY formats....what would be the best format to use?
Is there a format that can be imported to be a direct .lrf file? What program would I use to do that? FYI it does do a much better job at the OCR! Last edited by DeathtoToasters; 10-10-2008 at 09:17 PM. |
10-11-2008, 10:50 AM | #15 |
Guru
Posts: 860
Karma: 4380
Join Date: Feb 2008
Location: Almada, Portugal
Device: Cybook Gen3, Sony PRS 505, Kindle DXG and Samsung Galaxy Note
|
That depends to what final result you want.
If you want just a quick and dirty way of reading the book and can accept the errors that still are in the file, choose text, or word or html. If you want a tidy result, well formatted eBook with errors corrected, bad lines broken corrected, letters and words missing identified and corrected and so on, you need to enter the “purgatory” part of the process, called “proof reading”. Here you have to correct all the errors and format the book in a way that it looks good when read, like inserting page breaks before the begin of a new chapter (if you use the one book one file method), names of the chapters proper formatted so when you convert you generate a table of contents, etc, etc, etc, etc, etc, etc… You can do part of the corrections in the OCR program itself or part/all of it outside, per example in word. If you choose this late way, I advise you to save the output in pure text, so you can format everything from the beginning - if you save as word, many times, parts of the text are OCRed as different fonts, different sizes, bold you name it, and thus one passes sometimes hours just trying to un-format the text. To convert to the final format, you have Calibre for the Sony, or Mobipocket creator for the Mobipocket format, etc… One more thing: the proof reading part is always the most time consuming, irritating and difficult of any shift of supports workflow. A normal book - as you say 400 pages - can take you from hours to days!!! That’s why per example Project Gutenberg uses voluntaries to collectively proof read a book - let’s say 10 people for a 400 page book! Good luck, |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Advise for scanned pdf | Mike_73 | Sony Reader | 7 | 05-28-2010 05:43 AM |
PRS-600 Dictionary on scanned PDF? | antistar | Sony Reader | 8 | 11-29-2009 03:05 PM |
Does it handle PDF books full of scanned pages? | jusmee | Astak EZReader | 2 | 10-26-2009 07:06 PM |
pdf with scanned images | Leite | iRex | 5 | 08-18-2008 12:54 PM |
preparing scanned books before PDF-ing | sputnik | Reading and Management | 2 | 06-09-2008 02:00 AM |