View Single Post
Old 04-03-2010, 04:50 AM   #5
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,556
Karma: 19500001
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
Quote:
Originally Posted by pepak View Post
Personally, I think that the human way (you read it and decide if a paragraph should or shouldn't be there) the easiest. Most of the time, anyway. You can't avoid proofreading the OCRed text anyway, so you can just as well do the paragraph thing at the same time.
I agree. You have to read the book anyway. But just detecting paragraph break at page breaks is rather fast, you can check the beginning of every page and check whether pages that start with uppercase are new paragraphs or not (the OCR software will probably treat all of them in the same way, you only have to look for those that are not correct).
Jellby is offline   Reply With Quote