![]() |
#1 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,627
Karma: 406616
Join Date: Dec 2008
Location: Northern Virginia
Device: SurfacePro, SurfaceBook 2
|
New Scanner, Need Advice on What Comes Next
My darling husband bought me the Cannon DR2510 scanner for Christmas so that I can convert my vast collection of paperbooks to ebooks for my Sony 600 . I think it is his attempt to keep me out of the malls!
![]() The scanner came with Omni Page 4 SE OCR software but I have no idea what to do with it or whether I need to invest in better OCR software like AABBYY. So the BIG question is: What steps do I need to take to my scanned pdf files and convert them for use on a Sony reader? Any advice? |
![]() |
![]() |
![]() |
#2 |
Addict
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 210
Karma: 1000659
Join Date: Jan 2009
Location: Sunnyvale, CA
Device: Kindle Voyage, Kobo Aura H2O, PRS-650 (black), Kindle 3G
|
I don't have much to add, but want to follow this thread since I have a Canon DR-2050C. Great scanner and I love the duplex scanning feature.
What I can tell you is, if at all possible scan to html instead of pdf. At least that's what I gather from skimming the various sub forums here, especially the Calibre, ePub, and Workshop forums. ABBYY Finereader is fantastic software. |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,627
Karma: 406616
Join Date: Dec 2008
Location: Northern Virginia
Device: SurfacePro, SurfaceBook 2
|
An update: I've managed to get the OCR software that came with the scanner to work fairly well. I plan to save it to MS Word as both a doc and html file. I've been looking at the Book Creator template that's here at MR and the Book Designer software that's also available here. I have not been able to successful install the Book Designer app but really haven't filled with it since I am concentrating on getting the scanned file OCRed.
|
![]() |
![]() |
![]() |
#4 | |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,999
Karma: 300001
Join Date: Jan 2007
Location: Citrus Heights, California
Device: TWO Kindle 2s, one each Bookeen Cybook Gen3, Sony PRS-500, Axim X51V
|
Quote:
Anyway, have fun! Derek |
|
![]() |
![]() |
![]() |
#5 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,627
Karma: 406616
Join Date: Dec 2008
Location: Northern Virginia
Device: SurfacePro, SurfaceBook 2
|
Derek, I've learned not to look a gift horse in the mouth!
![]() The scanner works very well actually. It's getting the scanned text into a format that I can convert for use on my Sony that's giving me the biggest headache. |
![]() |
![]() |
Advert | |
|
![]() |
#6 |
MR Drone
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,613
Karma: 15612282
Join Date: Oct 2007
Location: DRONEZONE
Device: PB360+, Huawei MP5, Libra H20
|
If you are looking for OCR software at a normal price range that can handle a Apple, Windows, or Linus OS, I recommend Vuescan. I have used it for several years now....
http://www.hamrick.com/ |
![]() |
![]() |
![]() |
#7 |
.
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 3,408
Karma: 5647231
Join Date: Oct 2008
Device: never enough
|
Sounds like an adventure!
Glad the scanning/OCR phase went well-how much hand-editing do you see having to do now? The thought of that part always scares me. |
![]() |
![]() |
![]() |
#8 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,627
Karma: 406616
Join Date: Dec 2008
Location: Northern Virginia
Device: SurfacePro, SurfaceBook 2
|
@hidari ~ Thanks for the suggestion. I'll look into it. The software that came with the scanner is version 4 and the company is on version 17. I think I can upgrade but it won't be cheap.
Initially, there were very few errors detected by the OCR software (only about 1 every 5 or so pages). The problem now is in what format to save the file. Saving to .doc places each page of text into text boxes. Saving to .rtf or .txt removes much of the formatting and creates a messy file. I can't save to .html because that option is only available in the updated version of the OCR software. Consequently, I will need to do some major editing to get any of the MS type files ready for ebook conversion. Some of the editing will include deleting lines from the scanned edge of the book page, removing the author's name, book title and page numbers from each page. My next step is to see if I can get the Book Designer software working and perhaps upgrading my OCR software. I don't mind all the work, I love a challenge. I just wish I knew the correct steps to take in advance. I tend to be a bit obsessive compulsive about these kinds of things. ![]() |
![]() |
![]() |
![]() |
#9 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,213
Karma: 12890
Join Date: Feb 2009
Location: Amherst, Massachusetts, USA
Device: Sony PRS-505
|
If you're at all considering using the scans without OCRing them, then I highly recommend PDFLRF for cutting the text into reader-sized chunks, and removing margins/splitting columns, etc. This is usually the way I read scanned material on my Sony.
|
![]() |
![]() |
![]() |
#10 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,627
Karma: 406616
Join Date: Dec 2008
Location: Northern Virginia
Device: SurfacePro, SurfaceBook 2
|
I don't mind OCRing the pages. Using the Omnipage software I was able to find and repair several errors. My biggest problem is in saving the file. With the version of Omnipage that I have (very limited) I cannot save it as an html file. If I save it as a doc or rtf file and then open it in Word, the software has placed each page of text into individual textboxes. I don't know what to do with it at the point. If I try to save the file as an html, it removes all the formatting and it ends up looking like a txt file. Any suggestions?
|
![]() |
![]() |
![]() |
#11 |
Banned
![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 475
Karma: 796
Join Date: Sep 2008
Location: Honolulu
Device: Nokia 770 (fbreader)
|
You do really need to get the content into HTML.
From the RTF document, can you remove the styles that apply to the text (which are probably creating the text boxes?) Try selecting a section (are headers/chapters also in the text boxes?) and applying the "Normal" format. See what happens. If applying styles changes the text look, you should be able to simplify everything (except, hopefully, italics, bold and headings) and then export it to HTML. m a r |
![]() |
![]() |
![]() |
#12 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,627
Karma: 406616
Join Date: Dec 2008
Location: Northern Virginia
Device: SurfacePro, SurfaceBook 2
|
Thanks for the suggestion! I did try it (and I honestly thought it would work) but it was already in the "Normal" style. We even tried to "Select All" to see if we could make some changes that way and Word wouldn't let us do that. I'm wondering if it has something to do with the Compatibility Mode I'm forced to use by the OCR software.
![]() |
![]() |
![]() |
![]() |
#13 |
Banned
![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 475
Karma: 796
Join Date: Sep 2008
Location: Honolulu
Device: Nokia 770 (fbreader)
|
Hmmm...
Try OpenOffice? It's free. m a r |
![]() |
![]() |
![]() |
#14 |
<Insert Wit Here>
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 1,017
Karma: 1275899
Join Date: Jan 2008
Location: Puget Sound
Device: Kindle Oasis, Kobo Forma
|
Word's conversion to HTML usually produces rather messy HTML, but it tends to work okay. It's also worth giving a shot. The borders you see are the page margins which usually get ignored when converting to HTML.
|
![]() |
![]() |
![]() |
#15 |
Banned
![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 475
Karma: 796
Join Date: Sep 2008
Location: Honolulu
Device: Nokia 770 (fbreader)
|
There's also a rumor floating here on the boards that there exists a plugin for Word that strips almost everything but the basic italics and bold tags.
m a r |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
DIY Scanner | Eratosthenes | News | 14 | 04-16-2010 04:21 PM |
Welcher Scanner? | Jacques_N | Lounge | 15 | 04-01-2010 05:34 PM |
What scanner do you use, recommendations | Moejoe | Lounge | 8 | 07-11-2009 02:38 AM |
Using handheld scanner | Idoine | Workshop | 13 | 06-14-2009 06:42 PM |
Best Auto Feed Scanner for books? Need advice | Douglasco | Workshop | 4 | 03-11-2009 02:25 AM |