Quote:
Originally Posted by BobC
@Hitch,
We have different needs - in your case you get pretty good original stuff (i.e. all the words are right) with perhaps poor formatting; a lot of mine starts off as really messy OCRd text from Archives.org and I need to do a lot of manual editing and re-formatting to knock it into shape. The extensibilty of LO makes it useful with some great regex based alternative S&R additions - I often use OOOFBTools just for its text-tidying ability (I've stopped using FB2 as my preferred Ebook format). As I use LO as my normal WP it all works seamlessly for me
Anyway, I have noticed that in LO some italics are not true italic - they look like italic but can't be found via the normal italic search; they seem to use a variant of "character posture" it may be ones like this that are getting mis-imported.
Beware also that the standard Writer2XTHML doesn't work with current editions of LO - you need to get a patched version - have a look at this :
https://www.mobileread.com/forums/sho...&postcount=224
BobC
|
BobC,
Thanks for the info. I'll look it over again. However, one small comment:
Quote:
...in your case you get pretty good original stuff (i.e. all the words are right) with perhaps poor formatting...
|
Umhmmmm. That
would be nice. I wonder
who gets that? Probably the companies who have contracts with BPH's. ;-)
We do get a ton of OCR'ed material, too. Not to mention, my fave: the DIY scans. Those are real doozies. Sort of like first-gen PG stuff. And I think I've posted the odd weird thing that crosses my desk (like that one file, with the pilcrows at the
beginning of the lines...)
Hitch