View Single Post
Old 12-29-2009, 06:15 PM   #7220
DMcCunney
New York Editor
DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.DMcCunney ought to be getting tired of karma fortunes by now.
 
DMcCunney's Avatar
 
Posts: 6,384
Karma: 16540415
Join Date: Aug 2007
Device: PalmTX, Pocket eDGe, Alcatel Fierce 4, RCA Viking Pro 10, Nexus 7
Quote:
Originally Posted by kazbates View Post
@Dennis ~ I scanned the book on a Canon DR-2510 using the OmniPage OCD software (very limited in scope) bundled with the scanner. It placed the scanned document into a .opd file but I could have saved it as a pdf or tiff (I think there was one other format) using just the scanner software. After the OCD editing process, it gave me the option of saving as a doc, rtf or txt. When saving as a doc or rtf and then opening in Word for further editing, it placed each scanned page in a textbox. Saving as txt completely strips any formating. My thought was to edit in Word and then resave as an html file per HarryT's guide. When I tried to save the textboxed version of the doc file as an html, it stripped all the formating just as if it was a txt file. I'm thinking that the problem lies with the limited version of the Omnipage app but it could very easily be "operator error".
I don't think the problem was Omnipage, though I agree there are better things out there.

One of the issues you face is that each page will be a scanned image, which will be seperately OCRed, and the results saved to different files. Those files must be combined to be a complete book, and there the trouble starts. You're running into the divisions between files.

Which version of Word are you running, and how were you bringing the files into Word to edit? The last time I had to OCR stuff, I wasn't concerned with keeping formatting, and saved to text. I combined the text files from the command line (copy file1.txt+file2,txt+file3.txt... newfile.txt), then brought that into my preferred text editor for cleanup. (I generally use Notepad++, but have a dozen or so others installed. Among other things, I maintain a wiki devoted to text editors, and am always looking at new ones. See http://TextEditors.org)
______
Dennis
DMcCunney is offline   Reply With Quote