View Single Post
Old 12-10-2008, 12:16 PM   #2
Andurian
You really should try it!
Andurian doesn't litterAndurian doesn't litter
 
Posts: 57
Karma: 137
Join Date: Nov 2008
Device: PRS-500
Quote:
Originally Posted by daesdaemar View Post
Hello. I am relatively new to all of this but am loving my PRS-505 and really, really like Calibre. Most of my conversions are from .lit format to lrf and everything works fine.

However, I am really having a lot of problems with converting txt, rtf, and html files with Calibre to lrf.

If I convert any of those file types, invariably one of two things will occur: the conversion fails and I get an error window in Calibre with information that is meaningless to me, or if the conversion does take place, the formatting is awful with multiple word wrap errors, lots of white space, etc as examples.

Can someone lead me in the right direction in terms of any techniques when dealing with those particular file types?

Thanks in advance.
A couple of questions about your conversion errors:

1. What error message, exactly, are you getting?
2. Do the rtf and html files load properly in other applications?
3. Are the files unusually sized or named?

My formatting method is below - I use MS Word because I know how to access the whitespace characters in it via search. ^p is the paragraph mark in word.

1. Replace ^p^p^p with ^p^p. Repeat until there are no hits. (Removes excess blank lines).
2. Determine what marks off paragraphs. Usually it will be either ^p^p, a tab or multiple spaces to indent. Replace that with GGGGG. (So your paragraphs are now marked).
3. Replace -^p with nothing. (This is to get rid of hyphens in words divided at the ends of broken lines. The ^p is removed too so a space isn't entered into the word in the next step)
4. Replace ^p with a single space. (This will remove all line divisions that aren't between paragraphs. It needs to be a space rather than nothing to keep words on successive lines from running together.)
5. Replace GGGGG with ^p^p to put a blank line between paragraphs. If you don't like blank lines between your paragraphs replace GGGGG with ^p instead.
6. Replace two spaces with one space. (This will solve some extra spaces problems caused by step 3.)


By and large, this will give you well formatted text for the main body of a book. There is a very good chance it will goof up title pages, tables of contents, formatted poems in the text and stuff like that. If I'm worried about such things, I'll go through and fix them by hand afterwords - usually I'm not.

This won't turn straw into gold. If the original file you are working with is just a runon mess, there might be no salvaging it other than lots and lots of editing by hand.

BTW, if anyone sees anything wrong with this (I'm doing it from memory, though I've done it enough times my memory should do it correct me here before people are led astray

Last edited by Andurian; 12-10-2008 at 01:11 PM.
Andurian is offline   Reply With Quote