View Single Post
Old 12-10-2008, 01:58 PM   #3
daesdaemar
Addict
daesdaemar has a spectacular aura aboutdaesdaemar has a spectacular aura aboutdaesdaemar has a spectacular aura aboutdaesdaemar has a spectacular aura aboutdaesdaemar has a spectacular aura aboutdaesdaemar has a spectacular aura aboutdaesdaemar has a spectacular aura aboutdaesdaemar has a spectacular aura aboutdaesdaemar has a spectacular aura aboutdaesdaemar has a spectacular aura aboutdaesdaemar has a spectacular aura about
 
Posts: 210
Karma: 4282
Join Date: Oct 2008
Location: Florida
Device: Sony 505, Kindle 3, iPad 3
Quote:
Originally Posted by Andurian View Post
A couple of questions about your conversion errors:

1. What error message, exactly, are you getting?
2. Do the rtf and html files load properly in other applications?
3. Are the files unusually sized or named?

My formatting method is below - I use MS Word because I know how to access the whitespace characters in it via search. ^p is the paragraph mark in word.

1. Replace ^p^p^p with ^p^p. Repeat until there are no hits. (Removes excess blank lines).
2. Determine what marks off paragraphs. Usually it will be either ^p^p, a tab or multiple spaces to indent. Replace that with GGGGG. (So your paragraphs are now marked).
3. Replace -^p with nothing. (This is to get rid of hyphens in words divided at the ends of broken lines. The ^p is removed too so a space isn't entered into the word in the next step)
4. Replace ^p with a single space. (This will remove all line divisions that aren't between paragraphs. It needs to be a space rather than nothing to keep words on successive lines from running together.)
5. Replace GGGGG with ^p^p to put a blank line between paragraphs. If you don't like blank lines between your paragraphs replace GGGGG with ^p instead.
6. Replace two spaces with one space. (This will solve some extra spaces problems caused by step 3.)


By and large, this will give you well formatted text for the main body of a book. There is a very good chance it will goof up title pages, tables of contents, formatted poems in the text and stuff like that. If I'm worried about such things, I'll go through and fix them by hand afterwords - usually I'm not.

This won't turn straw into gold. If the original file you are working with is just a runon mess, there might be no salvaging it other than lots and lots of editing by hand.

BTW, if anyone sees anything wrong with this (I'm doing it from memory, though I've done it enough times my memory should do it correct me here before people are led astray
I will experiment with your method tonight. Also, when you are done with the above, in what format do you save the file in MS Word before converting in Calibre to lrf?
daesdaemar is offline   Reply With Quote