Quote:
Originally Posted by capidamonte
1) If you know the guy doing the scanning, why not get him to send you something a little more basic than a Word file? Surely he has an interim format that is more useful to you.
2) Any chance you could post some fragment of a file here that we could take a look at and try with various ideas?
cap
ps: I once reformatted a scanned PDF to HTML conversion that took something like 60 hours of work to make look right. I'm still not sure it's completely correct. I certainly won't do it again.
|
Hi, capidamonte:
First, I should have better explained, but I shortcutted when I made the POD reference. In order to best serve my clients, most of whom are digitizing rights-reverted books, having them receive a Word file that is already perfectly formatted for Print On Demand (with a few minor changes on the copyright page), is in their best interests. They can make a few changes to the copyright page, output it to pdf and the whole book is perfectly formatted for Print on Demand. This makes more work for yours truly, the slob who's doing the ebooks, because I have to remove all that crap, but it gives my clients the option to provide their backlist in EITHER format, without them paying additional monies to create differing files. Do you see what I'm saying?
Besides: if it's any more "basic" than a Word file, it doesn't really help me. I can have him send me a txt file, but that puts me right back where I am; once I remove all the bloody formatting, I have to go through the thing page-by-page and put back IN all the italics, blockquotes, bladdy-blah-blah. And the html he outputs is the same as what I convert from the Word file, so that doesn't help me.
I'm in the middle of an experiment using Word to do one set of things (change the page margins, eliminate the section breaks and s&r all the soft hyphens) and then using Sigil to do the regex searches to eliminate all the bloody spans. (I would use Crimson Editor for the regex, since Sigil is a titch sloooooooow during the saves, but the word-wrapping problem in CE is driving me bats.) If that doesn't work as I think it might, I'll copy a chunk of text and put it in a file and upload it here.
Thanks,
seriously,
Hitch