View Single Post
Old 04-03-2016, 01:57 AM   #64
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
Quote:
Originally Posted by sanjon View Post
That Word add-in sound a bit like what I have tried to do in the past using RegEx and the like, but much more advanced

Definitely need to check that out, cheers! Does that handle things like italic and bold tags as well? But I guess you wouldn't have that in an OCR generated document anyway.
Of course it handles italic and bold. You would be surprised how often that can be an issue in OCR. I usually strip bold while exporting (yes, that is an option), since that is usually an OCR mistake. There is also an option to use either the <i> and <b> or the <em> and <strong> tag.

Quote:
Originally Posted by rube View Post
oh, and of course, when I say I start with a .txt file it's because I've been given a word or open office file. I just select all and dump it on something like notepad to strip all the unwanted formatting and save it as a .txt. I get it off the word file very quickly and seeing as I'm only making very simple epubs none of the word formatting matters except for the parragraphs.
That would also loose italic and I find that is often quite important in an book.
Toxaris is offline   Reply With Quote