Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 12-27-2009, 11:11 AM   #1
kazbates
Wizard
kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.
 
kazbates's Avatar
 
Posts: 2,623
Karma: 400000
Join Date: Dec 2008
Location: Northern Virginia
Device: EBW-1150, Sony PRS-700BC, Sony PRS-600BC, Sony PRS650BC
New Scanner, Need Advice on What Comes Next

My darling husband bought me the Cannon DR2510 scanner for Christmas so that I can convert my vast collection of paperbooks to ebooks for my Sony 600 . I think it is his attempt to keep me out of the malls! I love the idea of converting to digital but I have no idea where to begin. I have calibrated the scanner and have successful created pdf files. I would like to take those pdf files and to ultimately convert them to either epub of lrf (which is actually my format of choice for my reader).

The scanner came with Omni Page 4 SE OCR software but I have no idea what to do with it or whether I need to invest in better OCR software like AABBYY.

So the BIG question is: What steps do I need to take to my scanned pdf files and convert them for use on a Sony reader?

Any advice?
kazbates is offline   Reply With Quote
Old 12-27-2009, 02:38 PM   #2
chainring
Addict
chainring will become famous soon enoughchainring will become famous soon enoughchainring will become famous soon enoughchainring will become famous soon enoughchainring will become famous soon enoughchainring will become famous soon enough
 
chainring's Avatar
 
Posts: 207
Karma: 659
Join Date: Jan 2009
Location: Sunnyvale, CA
Device: PRS-650 (black), Kindle 3G
I don't have much to add, but want to follow this thread since I have a Canon DR-2050C. Great scanner and I love the duplex scanning feature.

What I can tell you is, if at all possible scan to html instead of pdf. At least that's what I gather from skimming the various sub forums here, especially the Calibre, ePub, and Workshop forums. ABBYY Finereader is fantastic software.
chainring is offline   Reply With Quote
Old 12-27-2009, 03:05 PM   #3
kazbates
Wizard
kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.
 
kazbates's Avatar
 
Posts: 2,623
Karma: 400000
Join Date: Dec 2008
Location: Northern Virginia
Device: EBW-1150, Sony PRS-700BC, Sony PRS-600BC, Sony PRS650BC
An update: I've managed to get the OCR software that came with the scanner to work fairly well. I plan to save it to MS Word as both a doc and html file. I've been looking at the Book Creator template that's here at MR and the Book Designer software that's also available here. I have not been able to successful install the Book Designer app but really haven't filled with it since I am concentrating on getting the scanned file OCRed.
kazbates is offline   Reply With Quote
Old 12-27-2009, 03:58 PM   #4
delphidb96
Wizard
delphidb96 ought to be getting tired of karma fortunes by now.delphidb96 ought to be getting tired of karma fortunes by now.delphidb96 ought to be getting tired of karma fortunes by now.delphidb96 ought to be getting tired of karma fortunes by now.delphidb96 ought to be getting tired of karma fortunes by now.delphidb96 ought to be getting tired of karma fortunes by now.delphidb96 ought to be getting tired of karma fortunes by now.delphidb96 ought to be getting tired of karma fortunes by now.delphidb96 ought to be getting tired of karma fortunes by now.delphidb96 ought to be getting tired of karma fortunes by now.delphidb96 ought to be getting tired of karma fortunes by now.
 
Posts: 3,000
Karma: 300001
Join Date: Jan 2007
Location: Citrus Heights, California
Device: TWO Kindle 2s, one each Bookeen Cybook Gen3, Sony PRS-500, Axim X51V
Quote:
Originally Posted by kazbates View Post
My darling husband bought me the Cannon DR2510 scanner for Christmas so that I can convert my vast collection of paperbooks to ebooks for my Sony 600 . I think it is his attempt to keep me out of the malls! I love the idea of converting to digital but I have no idea where to begin. I have calibrated the scanner and have successful created pdf files. I would like to take those pdf files and to ultimately convert them to either epub of lrf (which is actually my format of choice for my reader).

The scanner came with Omni Page 4 SE OCR software but I have no idea what to do with it or whether I need to invest in better OCR software like AABBYY.

So the BIG question is: What steps do I need to take to my scanned pdf files and convert them for use on a Sony reader?

Any advice?
Ouch! Is there a particular reason why he didn't buy you an OpticBook for scanning in the books?

Anyway, have fun!

Derek
delphidb96 is offline   Reply With Quote
Old 12-27-2009, 04:23 PM   #5
kazbates
Wizard
kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.
 
kazbates's Avatar
 
Posts: 2,623
Karma: 400000
Join Date: Dec 2008
Location: Northern Virginia
Device: EBW-1150, Sony PRS-700BC, Sony PRS-600BC, Sony PRS650BC
Derek, I've learned not to look a gift horse in the mouth!

The scanner works very well actually. It's getting the scanned text into a format that I can convert for use on my Sony that's giving me the biggest headache.
kazbates is offline   Reply With Quote
Old 12-27-2009, 04:57 PM   #6
hidari
MR Drone
hidari ought to be getting tired of karma fortunes by now.hidari ought to be getting tired of karma fortunes by now.hidari ought to be getting tired of karma fortunes by now.hidari ought to be getting tired of karma fortunes by now.hidari ought to be getting tired of karma fortunes by now.hidari ought to be getting tired of karma fortunes by now.hidari ought to be getting tired of karma fortunes by now.hidari ought to be getting tired of karma fortunes by now.hidari ought to be getting tired of karma fortunes by now.hidari ought to be getting tired of karma fortunes by now.hidari ought to be getting tired of karma fortunes by now.
 
hidari's Avatar
 
Posts: 1,600
Karma: 15260410
Join Date: Oct 2007
Location: DRONEZONE
Device: OPUS/PB360,Nexus 7,GzONE, Kobo Mini
If you are looking for OCR software at a normal price range that can handle a Apple, Windows, or Linus OS, I recommend Vuescan. I have used it for several years now....

http://www.hamrick.com/
hidari is offline   Reply With Quote
Old 12-27-2009, 06:55 PM   #7
kjk
.
kjk ought to be getting tired of karma fortunes by now.kjk ought to be getting tired of karma fortunes by now.kjk ought to be getting tired of karma fortunes by now.kjk ought to be getting tired of karma fortunes by now.kjk ought to be getting tired of karma fortunes by now.kjk ought to be getting tired of karma fortunes by now.kjk ought to be getting tired of karma fortunes by now.kjk ought to be getting tired of karma fortunes by now.kjk ought to be getting tired of karma fortunes by now.kjk ought to be getting tired of karma fortunes by now.kjk ought to be getting tired of karma fortunes by now.
 
Posts: 3,408
Karma: 5647231
Join Date: Oct 2008
Device: never enough
Sounds like an adventure!
Glad the scanning/OCR phase went well-how much hand-editing do you see having to do now? The thought of that part always scares me.
kjk is offline   Reply With Quote
Old 12-27-2009, 07:52 PM   #8
kazbates
Wizard
kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.
 
kazbates's Avatar
 
Posts: 2,623
Karma: 400000
Join Date: Dec 2008
Location: Northern Virginia
Device: EBW-1150, Sony PRS-700BC, Sony PRS-600BC, Sony PRS650BC
@hidari ~ Thanks for the suggestion. I'll look into it. The software that came with the scanner is version 4 and the company is on version 17. I think I can upgrade but it won't be cheap.

Initially, there were very few errors detected by the OCR software (only about 1 every 5 or so pages). The problem now is in what format to save the file. Saving to .doc places each page of text into text boxes. Saving to .rtf or .txt removes much of the formatting and creates a messy file. I can't save to .html because that option is only available in the updated version of the OCR software. Consequently, I will need to do some major editing to get any of the MS type files ready for ebook conversion. Some of the editing will include deleting lines from the scanned edge of the book page, removing the author's name, book title and page numbers from each page. My next step is to see if I can get the Book Designer software working and perhaps upgrading my OCR software.

I don't mind all the work, I love a challenge. I just wish I knew the correct steps to take in advance. I tend to be a bit obsessive compulsive about these kinds of things.
kazbates is offline   Reply With Quote
Old 12-27-2009, 11:20 PM   #9
frabjous
Wizard
frabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameterfrabjous can solve quadratic equations while standing on his or her head reciting poetry in iambic pentameter
 
frabjous's Avatar
 
Posts: 1,213
Karma: 12890
Join Date: Feb 2009
Location: Amherst, Massachusetts, USA
Device: Sony PRS-505
If you're at all considering using the scans without OCRing them, then I highly recommend PDFLRF for cutting the text into reader-sized chunks, and removing margins/splitting columns, etc. This is usually the way I read scanned material on my Sony.
frabjous is offline   Reply With Quote
Old 12-28-2009, 07:43 AM   #10
kazbates
Wizard
kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.
 
kazbates's Avatar
 
Posts: 2,623
Karma: 400000
Join Date: Dec 2008
Location: Northern Virginia
Device: EBW-1150, Sony PRS-700BC, Sony PRS-600BC, Sony PRS650BC
I don't mind OCRing the pages. Using the Omnipage software I was able to find and repair several errors. My biggest problem is in saving the file. With the version of Omnipage that I have (very limited) I cannot save it as an html file. If I save it as a doc or rtf file and then open it in Word, the software has placed each page of text into individual textboxes. I don't know what to do with it at the point. If I try to save the file as an html, it removes all the formatting and it ends up looking like a txt file. Any suggestions?
kazbates is offline   Reply With Quote
Old 12-28-2009, 05:26 PM   #11
rogue_ronin
Banned
rogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-books
 
Posts: 475
Karma: 796
Join Date: Sep 2008
Location: Honolulu
Device: Nokia 770 (fbreader)
You do really need to get the content into HTML.

From the RTF document, can you remove the styles that apply to the text (which are probably creating the text boxes?)

Try selecting a section (are headers/chapters also in the text boxes?) and applying the "Normal" format. See what happens.

If applying styles changes the text look, you should be able to simplify everything (except, hopefully, italics, bold and headings) and then export it to HTML.

m a r
rogue_ronin is offline   Reply With Quote
Old 12-28-2009, 08:17 PM   #12
kazbates
Wizard
kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.kazbates ought to be getting tired of karma fortunes by now.
 
kazbates's Avatar
 
Posts: 2,623
Karma: 400000
Join Date: Dec 2008
Location: Northern Virginia
Device: EBW-1150, Sony PRS-700BC, Sony PRS-600BC, Sony PRS650BC
Thanks for the suggestion! I did try it (and I honestly thought it would work) but it was already in the "Normal" style. We even tried to "Select All" to see if we could make some changes that way and Word wouldn't let us do that. I'm wondering if it has something to do with the Compatibility Mode I'm forced to use by the OCR software.
kazbates is offline   Reply With Quote
Old 12-28-2009, 09:55 PM   #13
rogue_ronin
Banned
rogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-books
 
Posts: 475
Karma: 796
Join Date: Sep 2008
Location: Honolulu
Device: Nokia 770 (fbreader)
Hmmm...

Try OpenOffice? It's free.

m a r
rogue_ronin is offline   Reply With Quote
Old 12-31-2009, 08:51 PM   #14
Kolenka
<Insert Wit Here>
Kolenka ought to be getting tired of karma fortunes by now.Kolenka ought to be getting tired of karma fortunes by now.Kolenka ought to be getting tired of karma fortunes by now.Kolenka ought to be getting tired of karma fortunes by now.Kolenka ought to be getting tired of karma fortunes by now.Kolenka ought to be getting tired of karma fortunes by now.Kolenka ought to be getting tired of karma fortunes by now.Kolenka ought to be getting tired of karma fortunes by now.Kolenka ought to be getting tired of karma fortunes by now.Kolenka ought to be getting tired of karma fortunes by now.Kolenka ought to be getting tired of karma fortunes by now.
 
Kolenka's Avatar
 
Posts: 973
Karma: 1254645
Join Date: Jan 2008
Location: Puget Sound
Device: Sony T2, Kindle Paperwhite
Word's conversion to HTML usually produces rather messy HTML, but it tends to work okay. It's also worth giving a shot. The borders you see are the page margins which usually get ignored when converting to HTML.
Kolenka is offline   Reply With Quote
Old 12-31-2009, 09:13 PM   #15
rogue_ronin
Banned
rogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-booksrogue_ronin has learned how to read e-books
 
Posts: 475
Karma: 796
Join Date: Sep 2008
Location: Honolulu
Device: Nokia 770 (fbreader)
There's also a rumor floating here on the boards that there exists a plugin for Word that strips almost everything but the basic italics and bold tags.

m a r
rogue_ronin is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
DIY Scanner Eratosthenes News 14 04-16-2010 04:21 PM
Welcher Scanner? Jacques_N Lounge 15 04-01-2010 05:34 PM
What scanner do you use, recommendations Moejoe Lounge 8 07-11-2009 02:38 AM
Using handheld scanner Idoine Workshop 13 06-14-2009 06:42 PM
Best Auto Feed Scanner for books? Need advice Douglasco Workshop 4 03-11-2009 02:25 AM


All times are GMT -4. The time now is 01:25 PM.


MobileRead.com is a privately owned, operated and funded community.