Quote:
Originally Posted by sanjon
That Word add-in sound a bit like what I have tried to do in the past using RegEx and the like, but much more advanced
Definitely need to check that out, cheers! Does that handle things like italic and bold tags as well? But I guess you wouldn't have that in an OCR generated document anyway.
|
Of course it handles italic and bold. You would be surprised how often that can be an issue in OCR. I usually strip bold while exporting (yes, that is an option), since that is usually an OCR mistake. There is also an option to use either the <i> and <b> or the <em> and <strong> tag.
Quote:
Originally Posted by rube
oh, and of course, when I say I start with a .txt file it's because I've been given a word or open office file. I just select all and dump it on something like notepad to strip all the unwanted formatting and save it as a .txt. I get it off the word file very quickly and seeing as I'm only making very simple epubs none of the word formatting matters except for the parragraphs.
|
That would also loose italic and I find that is often quite important in an book.