Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Readers > Sony Reader

Notices

Reply
 
Thread Tools Search this Thread
Old 10-10-2008, 02:27 PM   #1
DeathtoToasters
Connoisseur
DeathtoToasters began at the beginning.
 
Posts: 86
Karma: 10
Join Date: Nov 2007
Device: Irex Illiad, Sony 505
Ok I have scanned pdf books....but

each page is a scanned image. So the text is very small. When I try to increase the font....nothing happens. Changing pages takes forever and I dont know why.

Is there a way to change this book from pdf scanned images to a normal LRF file? Would that make it look like a normal lrf book?

Thanks
DeathtoToasters is offline   Reply With Quote
Old 10-10-2008, 03:50 PM   #2
jedavis1
Zealot
jedavis1 doesn't litterjedavis1 doesn't litter
 
jedavis1's Avatar
 
Posts: 103
Karma: 148
Join Date: Aug 2008
Location: Huntington, IN US
Device: Sony PRS-505
You will need some kind of OCR (Optical Character Recognition) software. There are a few freeware versions if you Google them.
jedavis1 is offline   Reply With Quote
Old 10-10-2008, 04:48 PM   #3
DeathtoToasters
Connoisseur
DeathtoToasters began at the beginning.
 
Posts: 86
Karma: 10
Join Date: Nov 2007
Device: Irex Illiad, Sony 505
Quote:
Originally Posted by jedavis1 View Post
You will need some kind of OCR (Optical Character Recognition) software. There are a few freeware versions if you Google them.
The issue I am really running into is that the OCR is screwing up alot of text because of the crazy words in a Star Wars book
DeathtoToasters is offline   Reply With Quote
Old 10-10-2008, 05:12 PM   #4
Patricia
Reader
Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.Patricia ought to be getting tired of karma fortunes by now.
 
Patricia's Avatar
 
Posts: 11,520
Karma: 2199070
Join Date: May 2007
Location: South Wales, UK
Device: Sony PRS-500, PRS-505, Asus EEEpc 4G
Quote:
Originally Posted by DeathtoToasters View Post
The issue I am really running into is that the OCR is screwing up alot of text because of the crazy words in a Star Wars book
No, OCR screws up the text in perfectly normal books too. I expect to spend about 10-15 hours unscrewing a straightforward novel, depending on its length, and on whether there are a lot of italics or emdashes to reinstate.
Patricia is offline   Reply With Quote
Old 10-10-2008, 05:21 PM   #5
DeathtoToasters
Connoisseur
DeathtoToasters began at the beginning.
 
Posts: 86
Karma: 10
Join Date: Nov 2007
Device: Irex Illiad, Sony 505
Quote:
Originally Posted by Patricia View Post
No, OCR screws up the text in perfectly normal books too. I expect to spend about 10-15 hours unscrewing a straightforward novel, depending on its length, and on whether there are a lot of italics or emdashes to reinstate.
Having to go through the whole 400 page book looking for mistakes almost does not make it worth it.
DeathtoToasters is offline   Reply With Quote
Old 10-10-2008, 05:27 PM   #6
DMcCunney
New York Editor
DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.
 
DMcCunney's Avatar
 
Posts: 5,120
Karma: 6210987
Join Date: Aug 2007
Device: Palm TX, Azpen A727 tablet, Fujitsu Lifebook p2110 w/ FBReader
Quote:
Originally Posted by DeathtoToasters View Post
each page is a scanned image. So the text is very small. When I try to increase the font....nothing happens. Changing pages takes forever and I dont know why.

Is there a way to change this book from pdf scanned images to a normal LRF file? Would that make it look like a normal lrf book?

Thanks
What you have after scanning is an image of each page. You need to run those images through optical character recognition software, that will attempt to determine what is text and output it as a file. No OCR software is perfect, so depending upon the scan and the software, you'll have to do a fair bit of editing and cleanup to correct errors where the OCR software guessed wrong about what something was.

Once you've done that, you can see about converting the result to a supported ebook format.
______
Dennis
DMcCunney is offline   Reply With Quote
Old 10-10-2008, 07:10 PM   #7
jedavis1
Zealot
jedavis1 doesn't litterjedavis1 doesn't litter
 
jedavis1's Avatar
 
Posts: 103
Karma: 148
Join Date: Aug 2008
Location: Huntington, IN US
Device: Sony PRS-505
Quote:
Originally Posted by DeathtoToasters View Post
The issue I am really running into is that the OCR is screwing up alot of text because of the crazy words in a Star Wars book
What software are you using? I have had pretty decent luck with Adobe Acrobat's OCR system. The biggest issue with OCR is to get a clean and straight scan. You want the resolution set to about 250 - 300 dpi. and you want the final output to be a bitmap tiff. Not grayscale or RGB/CMYK. You definitely don't want it as a jpg.
jedavis1 is offline   Reply With Quote
Old 10-10-2008, 07:51 PM   #8
bbusybookworm
Tech Junkie
bbusybookworm knows the difference between 'who' and 'whom'bbusybookworm knows the difference between 'who' and 'whom'bbusybookworm knows the difference between 'who' and 'whom'bbusybookworm knows the difference between 'who' and 'whom'bbusybookworm knows the difference between 'who' and 'whom'bbusybookworm knows the difference between 'who' and 'whom'bbusybookworm knows the difference between 'who' and 'whom'bbusybookworm knows the difference between 'who' and 'whom'bbusybookworm knows the difference between 'who' and 'whom'bbusybookworm knows the difference between 'who' and 'whom'bbusybookworm knows the difference between 'who' and 'whom'
 
bbusybookworm's Avatar
 
Posts: 1,027
Karma: 10080
Join Date: Aug 2007
Location: UK / Egypt / India
Device: PSR500,Gen3, iPhone3G, PPro, iPad , Samsung Galaxy S2
I'll say that Adobe's OCR is basic and while it does a decent job, in no where near that accurate. It does ok with Normal office documents and stuff, where the language is plain, it's not that good when you throw in a lot of strange / complex words or layouts into the mix.

I've had much better results from Abby Finreader 9 Pro when I got the chance to use it.
It was able to identify words, diagrams, etc with relative ease and was able to deal with foreign language words and accents much better.

The drawback is that its not cheap, and it does take a while to process and is quiet system heavy.
bbusybookworm is offline   Reply With Quote
Old 10-10-2008, 07:58 PM   #9
DDHarriman
Guru
DDHarriman can extract oil from cheeseDDHarriman can extract oil from cheeseDDHarriman can extract oil from cheeseDDHarriman can extract oil from cheeseDDHarriman can extract oil from cheeseDDHarriman can extract oil from cheeseDDHarriman can extract oil from cheeseDDHarriman can extract oil from cheeseDDHarriman can extract oil from cheese
 
Posts: 851
Karma: 1200
Join Date: Feb 2008
Location: Almada, Portugal
Device: Cybook Gen3, Sony PRS 505, Kindle DXG and Samsung Galaxy Note
The leader OCR applications are:

Finereader Pro 9.0
Omnipage Pro 16

And we are talking about more then 99% accuracy...
DDHarriman is offline   Reply With Quote
Old 10-10-2008, 08:09 PM   #10
DeathtoToasters
Connoisseur
DeathtoToasters began at the beginning.
 
Posts: 86
Karma: 10
Join Date: Nov 2007
Device: Irex Illiad, Sony 505
Thanks for the info....I am going to try the finereader. We have it at work so getting it is not an issue.
DeathtoToasters is offline   Reply With Quote
Old 10-10-2008, 08:14 PM   #11
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 36,466
Karma: 17660776
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Sony Reader PRS-650, iPad, nook STR
It probably isn't worth the effort. Would be better to just get the book as a legal eBook (if it exists and while Fictionwise and BooksOnBoard are both having 50% off sales) or a pBook.
JSWolf is offline   Reply With Quote
Old 10-10-2008, 08:35 PM   #12
bbusybookworm
Tech Junkie
bbusybookworm knows the difference between 'who' and 'whom'bbusybookworm knows the difference between 'who' and 'whom'bbusybookworm knows the difference between 'who' and 'whom'bbusybookworm knows the difference between 'who' and 'whom'bbusybookworm knows the difference between 'who' and 'whom'bbusybookworm knows the difference between 'who' and 'whom'bbusybookworm knows the difference between 'who' and 'whom'bbusybookworm knows the difference between 'who' and 'whom'bbusybookworm knows the difference between 'who' and 'whom'bbusybookworm knows the difference between 'who' and 'whom'bbusybookworm knows the difference between 'who' and 'whom'
 
bbusybookworm's Avatar
 
Posts: 1,027
Karma: 10080
Join Date: Aug 2007
Location: UK / Egypt / India
Device: PSR500,Gen3, iPhone3G, PPro, iPad , Samsung Galaxy S2
Quote:
Originally Posted by JSWolf View Post
It probably isn't worth the effort. Would be better to just get the book as a legal eBook (if it exists and while Fictionwise and BooksOnBoard are both having 50% off sales) or a pBook.
Probably true in many cases, but until eBook availability matches pBook its still going to be a viable alternative and the only one in many cases .

Still, If its a Star Wars book, it should probably be available, unless its a really old one.
bbusybookworm is offline   Reply With Quote
Old 10-10-2008, 08:56 PM   #13
DeathtoToasters
Connoisseur
DeathtoToasters began at the beginning.
 
Posts: 86
Karma: 10
Join Date: Nov 2007
Device: Irex Illiad, Sony 505
Well I do legal own the book, I scanned it in myself. There is no third party involved.

It is actually new Star Wars Book, Order 66 by Karen Traviss. Surprisingly it is not on mobipocket, fictionwise, or the sony store.
DeathtoToasters is offline   Reply With Quote
Old 10-10-2008, 09:12 PM   #14
DeathtoToasters
Connoisseur
DeathtoToasters began at the beginning.
 
Posts: 86
Karma: 10
Join Date: Nov 2007
Device: Irex Illiad, Sony 505
Ok, with Finereader I have the option to save it in MANY formats....what would be the best format to use?

Is there a format that can be imported to be a direct .lrf file?

What program would I use to do that?

FYI it does do a much better job at the OCR!

Last edited by DeathtoToasters; 10-10-2008 at 09:17 PM.
DeathtoToasters is offline   Reply With Quote
Old 10-11-2008, 10:50 AM   #15
DDHarriman
Guru
DDHarriman can extract oil from cheeseDDHarriman can extract oil from cheeseDDHarriman can extract oil from cheeseDDHarriman can extract oil from cheeseDDHarriman can extract oil from cheeseDDHarriman can extract oil from cheeseDDHarriman can extract oil from cheeseDDHarriman can extract oil from cheeseDDHarriman can extract oil from cheese
 
Posts: 851
Karma: 1200
Join Date: Feb 2008
Location: Almada, Portugal
Device: Cybook Gen3, Sony PRS 505, Kindle DXG and Samsung Galaxy Note
That depends to what final result you want.

If you want just a quick and dirty way of reading the book and can accept the errors that still are in the file, choose text, or word or html.

If you want a tidy result, well formatted eBook with errors corrected, bad lines broken corrected, letters and words missing identified and corrected and so on, you need to enter the “purgatory” part of the process, called “proof reading”.

Here you have to correct all the errors and format the book in a way that it looks good when read, like inserting page breaks before the begin of a new chapter (if you use the one book one file method), names of the chapters proper formatted so when you convert you generate a table of contents, etc, etc, etc, etc, etc, etc…
You can do part of the corrections in the OCR program itself or part/all of it outside, per example in word.
If you choose this late way, I advise you to save the output in pure text, so you can format everything from the beginning - if you save as word, many times, parts of the text are OCRed as different fonts, different sizes, bold you name it, and thus one passes sometimes hours just trying to un-format the text.

To convert to the final format, you have Calibre for the Sony, or Mobipocket creator for the Mobipocket format, etc…

One more thing: the proof reading part is always the most time consuming, irritating and difficult of any shift of supports workflow.
A normal book - as you say 400 pages - can take you from hours to days!!!
That’s why per example Project Gutenberg uses voluntaries to collectively proof read a book - let’s say 10 people for a 400 page book!

Good luck,
DDHarriman is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Advise for scanned pdf Mike_73 Sony Reader 7 05-28-2010 05:43 AM
PRS-600 Dictionary on scanned PDF? antistar Sony Reader 8 11-29-2009 03:05 PM
Does it handle PDF books full of scanned pages? jusmee Astak EZReader 2 10-26-2009 07:06 PM
pdf with scanned images Leite iRex 5 08-18-2008 12:54 PM
preparing scanned books before PDF-ing sputnik Reading and Management 2 06-09-2008 02:00 AM


All times are GMT -4. The time now is 11:14 AM.


MobileRead.com is a privately owned, operated and funded community.