View Single Post
Old 04-02-2010, 09:31 AM   #1
71117c
Junior Member
71117c began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Mar 2010
Device: PB 301
2 questions concerning the scanning novels

hi,

i recently thought of buying a scanner and use destructive scanning (i.e. cut the binding) to convert novels into epub. to get an idea how tedious this process would be, i converted books, that i have in pdf format on my pc, to tiff and then ran some linux ocr software on them.
i encountered two issues, which would slow down the conversion process tremendously, if i can't solve them:

1) detection of italic fonts.
2) detection of paragraphs: the ocr software i was using, detected paragraphs within a page fine. but since it operated on a single page, it couldn't recognise, if the last sentence on a page, that ended with a period there, was also the end of a paragraph or not.

is there any ocr software (windows or linux), that could reliable handle those two problems?
1) is "only" an ocr problem, but for 2) i would need something like: last sentence on a page ends with a period. -> check if first sentence on the next page is indented. if so -> new paragraph.

cheers 71117c
71117c is offline   Reply With Quote