MobileRead Forums - View Single Post

DSpider · 07-26-2011, 05:21 AM

I'm not very fond of automatic processing so that "remove headers and footers" option in FineReader doesn't sound like a very wise thing. For instance if the bottom row is part of the text but the author chose a smaller font... the program will probably see it as a footer. And remove it. Which I really really DON'T want. I don't know, maybe I'm just paranoid... But I usually select the text area manually. Doesn't matter if the page has two or more text areas, just as long as the text you're trying to extract is selected.

Batch replacing isn't a very good idea, unless you know what you're doing - for instance replacing all minuses between spaces (" - ") with an en dash (" – ") between spaces. You could also use the "Replace" button (not "Replace All") and it will automatically take you to the next instance to see if it should be replaced or left alone. It's a pretty neat feature. Use with caution though...

Regarding better OCR software... there is none. And probably never will because a lot of books have various printing imperfections. It's true that newer publications use better printing methods but then there's the occasional typo, mistranslation, etc... So there's going to be some kind of proof reading once you're done, at least once.

I usually proofread in FineReader initially (to have the original scan available right under the extracted text) and a second time on my device or computer screen, depending on the book or output format. I then highlight typos that were omitted in the initial phase (with a yellow background) and correct them in the source document. And I do this for the entire book. Meaning I (casually) read the book a second time. It's a chore, I know. But the end result is a very high quality e-book.

07-26-2011, 05:21 AM	#6
DSpider Evangelist Posts: 450 Karma: 343115 Join Date: Nov 2009 Location: Romania Device: PW2 2014	I'm not very fond of automatic processing so that "remove headers and footers" option in FineReader doesn't sound like a very wise thing. For instance if the bottom row is part of the text but the author chose a smaller font... the program will probably see it as a footer. And remove it. Which I really really DON'T want. I don't know, maybe I'm just paranoid... But I usually select the text area manually. Doesn't matter if the page has two or more text areas, just as long as the text you're trying to extract is selected. Batch replacing isn't a very good idea, unless you know what you're doing - for instance replacing all minuses between spaces (" - ") with an en dash (" – ") between spaces. You could also use the "Replace" button (not "Replace All") and it will automatically take you to the next instance to see if it should be replaced or left alone. It's a pretty neat feature. Use with caution though... Regarding better OCR software... there is none. And probably never will because a lot of books have various printing imperfections. It's true that newer publications use better printing methods but then there's the occasional typo, mistranslation, etc... So there's going to be some kind of proof reading once you're done, at least once. I usually proofread in FineReader initially (to have the original scan available right under the extracted text) and a second time on my device or computer screen, depending on the book or output format. I then highlight typos that were omitted in the initial phase (with a yellow background) and correct them in the source document. And I do this for the entire book. Meaning I (casually) read the book a second time. It's a chore, I know. But the end result is a very high quality e-book. Last edited by DSpider; 07-26-2011 at 05:27 AM.