2 questions concerning the scanning novels
hi,
i recently thought of buying a scanner and use destructive scanning (i.e. cut the binding) to convert novels into epub. to get an idea how tedious this process would be, i converted books, that i have in pdf format on my pc, to tiff and then ran some linux ocr software on them.
i encountered two issues, which would slow down the conversion process tremendously, if i can't solve them:
1) detection of italic fonts.
2) detection of paragraphs: the ocr software i was using, detected paragraphs within a page fine. but since it operated on a single page, it couldn't recognise, if the last sentence on a page, that ended with a period there, was also the end of a paragraph or not.
is there any ocr software (windows or linux), that could reliable handle those two problems?
1) is "only" an ocr problem, but for 2) i would need something like: last sentence on a page ends with a period. -> check if first sentence on the next page is indented. if so -> new paragraph.
cheers 71117c
|