12-10-2008, 10:18 AM | #1 |
Addict
Posts: 210
Karma: 4282
Join Date: Oct 2008
Location: Florida
Device: Sony 505, Kindle 3, iPad 3
|
TXT, RTF, and HTML conversion issues
Hello. I am relatively new to all of this but am loving my PRS-505 and really, really like Calibre. Most of my conversions are from .lit format to lrf and everything works fine.
However, I am really having a lot of problems with converting txt, rtf, and html files with Calibre to lrf. If I convert any of those file types, invariably one of two things will occur: the conversion fails and I get an error window in Calibre with information that is meaningless to me, or if the conversion does take place, the formatting is awful with multiple word wrap errors, lots of white space, etc as examples. Can someone lead me in the right direction in terms of any techniques when dealing with those particular file types? Thanks in advance. |
12-10-2008, 12:16 PM | #2 | |
You really should try it!
Posts: 57
Karma: 137
Join Date: Nov 2008
Device: PRS-500
|
Quote:
1. What error message, exactly, are you getting? 2. Do the rtf and html files load properly in other applications? 3. Are the files unusually sized or named? My formatting method is below - I use MS Word because I know how to access the whitespace characters in it via search. ^p is the paragraph mark in word. 1. Replace ^p^p^p with ^p^p. Repeat until there are no hits. (Removes excess blank lines). 2. Determine what marks off paragraphs. Usually it will be either ^p^p, a tab or multiple spaces to indent. Replace that with GGGGG. (So your paragraphs are now marked). 3. Replace -^p with nothing. (This is to get rid of hyphens in words divided at the ends of broken lines. The ^p is removed too so a space isn't entered into the word in the next step) 4. Replace ^p with a single space. (This will remove all line divisions that aren't between paragraphs. It needs to be a space rather than nothing to keep words on successive lines from running together.) 5. Replace GGGGG with ^p^p to put a blank line between paragraphs. If you don't like blank lines between your paragraphs replace GGGGG with ^p instead. 6. Replace two spaces with one space. (This will solve some extra spaces problems caused by step 3.) By and large, this will give you well formatted text for the main body of a book. There is a very good chance it will goof up title pages, tables of contents, formatted poems in the text and stuff like that. If I'm worried about such things, I'll go through and fix them by hand afterwords - usually I'm not. This won't turn straw into gold. If the original file you are working with is just a runon mess, there might be no salvaging it other than lots and lots of editing by hand. BTW, if anyone sees anything wrong with this (I'm doing it from memory, though I've done it enough times my memory should do it correct me here before people are led astray Last edited by Andurian; 12-10-2008 at 01:11 PM. |
|
Advert | |
|
12-10-2008, 01:58 PM | #3 | |
Addict
Posts: 210
Karma: 4282
Join Date: Oct 2008
Location: Florida
Device: Sony 505, Kindle 3, iPad 3
|
Quote:
|
|
12-10-2008, 02:44 PM | #4 |
Wizard
Posts: 1,163
Karma: 32196
Join Date: Jan 2007
Location: Anchorage, AK
Device: Sony Reader PRS-505, PRS-650, PRS-T3, Pocketbook HD2
|
Andurian, you seem like you know your way around formatting documents. Do you by any chance have any advice on what to do if you have a text file and every line has a pargraph marker? I opened it in Word 2007 and at the end of every line there is a paragraph marker.
|
12-10-2008, 03:49 PM | #5 | |
You really should try it!
Posts: 57
Karma: 137
Join Date: Nov 2008
Device: PRS-500
|
Quote:
And I always save in RTF for Calibre to convert. It seems to handle RTF well. |
|
Advert | |
|
12-10-2008, 04:00 PM | #6 |
Wizard
Posts: 1,163
Karma: 32196
Join Date: Jan 2007
Location: Anchorage, AK
Device: Sony Reader PRS-505, PRS-650, PRS-T3, Pocketbook HD2
|
|
12-10-2008, 04:02 PM | #7 | |
Wizard
Posts: 1,163
Karma: 32196
Join Date: Jan 2007
Location: Anchorage, AK
Device: Sony Reader PRS-505, PRS-650, PRS-T3, Pocketbook HD2
|
p.s.
Quote:
the problem is there are no indents on this file. So I'm not sure how to mark off where the paragraph starts. I thought maybe the last sentence because sometimes there is blank space afterwards...but I tried that already and it didn't work. I used ^p insertblankwhitespacehere but it couldn't find it. |
|
12-10-2008, 04:31 PM | #8 | |
You really should try it!
Posts: 57
Karma: 137
Join Date: Nov 2008
Device: PRS-500
|
Quote:
As you describe it, there is a ^p after each line. If there is a blank line between paragraphs there should be a ^p^p between paragraphs - the first being after the last line, the second being the one that creates the blank line between them. |
|
12-10-2008, 04:52 PM | #9 |
Addict
Posts: 210
Karma: 4282
Join Date: Oct 2008
Location: Florida
Device: Sony 505, Kindle 3, iPad 3
|
OK, this is what I did with a txt file that I was having a lot of trouble with: I opened it in MS Word and saved it as rtf. Then with the rtf file I did the following (per Bob Russell's advice in a separate post)
To do that, I use MS Word mass replace (all occurances) as follows: *) Mass replace ^p with <$$> (anything that's a distinct pattern works) *) Mass replace <$$><$$> with ^p (to remove the double paragraph marks) *) Mass replace <$$> with a space, to allow text to flow naturally *) Mass replace ^l^l with ^l to remove double line breaks. Saved as rtf and converted with Calibre. Perfect. |
12-10-2008, 05:32 PM | #10 | |
Wizard
Posts: 1,163
Karma: 32196
Join Date: Jan 2007
Location: Anchorage, AK
Device: Sony Reader PRS-505, PRS-650, PRS-T3, Pocketbook HD2
|
Quote:
Here's an example. "A mental sigh of relief reached him: Nikki's thought, Mik- hyel's, or both; or perhaps just his own. But it was a short-lived relief. At the gate, chaos reigned, delivery vehicles jammed the opening, the silk balloons that normally rose above them, taking the strain off the axles, lay limp over the cargo or deflated even as they watched; further evidence, if they needed it, that the node's power umbrella was rapidly failing. Or perhaps, Deymorin thought, as he raised his eyes to see stormciouds gathering above the city, that energy was being redirected." In the file I have, after every line there is a ^p formatting mark. |
|
12-10-2008, 06:15 PM | #11 | |
You really should try it!
Posts: 57
Karma: 137
Join Date: Nov 2008
Device: PRS-500
|
Quote:
Aside from writing a script that does what you do to determine where paragraphs end (look for short lines ending with sentence ending punctuation, basically) and add a ^p at that point...that would catch most of them, though it would likely also have a few false positives. Anyone know whether there is a script like that out there somewhere? |
|
12-10-2008, 07:28 PM | #12 | |
Wizard
Posts: 1,163
Karma: 32196
Join Date: Jan 2007
Location: Anchorage, AK
Device: Sony Reader PRS-505, PRS-650, PRS-T3, Pocketbook HD2
|
Quote:
I'm crossing my fingers for an offical release I can buy. A script would be handy. |
|
12-10-2008, 07:32 PM | #13 | |
Grand Sorcerer
Posts: 5,185
Karma: 25133758
Join Date: Nov 2008
Location: SF Bay Area, California, USA
Device: Pocketbook Touch HD3 (Past: Kobo Mini, PEZ, PRS-505, Clié)
|
Quote:
It's not 100% accurate, and you have to rotate through the rest of the end-of-sentence punctuation (question marks, exclamation points), but it gives a good start to work from--changes it from "add a return after every single paragraph" to "proofread for missing returns for quotes after a colon or emdash." |
|
12-10-2008, 08:06 PM | #14 | |
Wizard
Posts: 1,163
Karma: 32196
Join Date: Jan 2007
Location: Anchorage, AK
Device: Sony Reader PRS-505, PRS-650, PRS-T3, Pocketbook HD2
|
Quote:
So in replace you type in ."^p and replace it with an extra ."^p^p? |
|
12-10-2008, 08:46 PM | #15 |
Junior Member
Posts: 3
Karma: 10
Join Date: Dec 2008
Device: PDA
|
Text format is ok because it downloads much faster but the problem is you can't highlight it where you stopped.
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
conversion to txt or rtf makes empty file | lunixer | Calibre | 10 | 08-25-2010 04:56 PM |
inserting blank lines into rtf/txt/html | errata | Ectaco jetBook | 7 | 07-10-2010 09:16 PM |
HTML to TXT conversion | alkr | Calibre | 3 | 10-02-2009 09:54 AM |
[Old Thread] unable to convert ebooks(rtf, txt,lit,html,pdf) to lrf in calibre .4.131 | jackdeth191 | Calibre | 9 | 05-02-2009 02:55 AM |
PRS-500 New conversion method: txt->rst->html->lrf | phrodod | Sony Reader Dev Corner | 7 | 09-13-2007 02:50 AM |