Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 12-10-2008, 10:18 AM   #1
daesdaemar
Addict
daesdaemar has a spectacular aura aboutdaesdaemar has a spectacular aura aboutdaesdaemar has a spectacular aura aboutdaesdaemar has a spectacular aura aboutdaesdaemar has a spectacular aura aboutdaesdaemar has a spectacular aura aboutdaesdaemar has a spectacular aura aboutdaesdaemar has a spectacular aura aboutdaesdaemar has a spectacular aura aboutdaesdaemar has a spectacular aura aboutdaesdaemar has a spectacular aura about
 
Posts: 210
Karma: 4282
Join Date: Oct 2008
Location: Florida
Device: Sony 505, Kindle 3, iPad 3
TXT, RTF, and HTML conversion issues

Hello. I am relatively new to all of this but am loving my PRS-505 and really, really like Calibre. Most of my conversions are from .lit format to lrf and everything works fine.

However, I am really having a lot of problems with converting txt, rtf, and html files with Calibre to lrf.

If I convert any of those file types, invariably one of two things will occur: the conversion fails and I get an error window in Calibre with information that is meaningless to me, or if the conversion does take place, the formatting is awful with multiple word wrap errors, lots of white space, etc as examples.

Can someone lead me in the right direction in terms of any techniques when dealing with those particular file types?

Thanks in advance.
daesdaemar is offline   Reply With Quote
Old 12-10-2008, 12:16 PM   #2
Andurian
You really should try it!
Andurian doesn't litterAndurian doesn't litter
 
Posts: 57
Karma: 137
Join Date: Nov 2008
Device: PRS-500
Quote:
Originally Posted by daesdaemar View Post
Hello. I am relatively new to all of this but am loving my PRS-505 and really, really like Calibre. Most of my conversions are from .lit format to lrf and everything works fine.

However, I am really having a lot of problems with converting txt, rtf, and html files with Calibre to lrf.

If I convert any of those file types, invariably one of two things will occur: the conversion fails and I get an error window in Calibre with information that is meaningless to me, or if the conversion does take place, the formatting is awful with multiple word wrap errors, lots of white space, etc as examples.

Can someone lead me in the right direction in terms of any techniques when dealing with those particular file types?

Thanks in advance.
A couple of questions about your conversion errors:

1. What error message, exactly, are you getting?
2. Do the rtf and html files load properly in other applications?
3. Are the files unusually sized or named?

My formatting method is below - I use MS Word because I know how to access the whitespace characters in it via search. ^p is the paragraph mark in word.

1. Replace ^p^p^p with ^p^p. Repeat until there are no hits. (Removes excess blank lines).
2. Determine what marks off paragraphs. Usually it will be either ^p^p, a tab or multiple spaces to indent. Replace that with GGGGG. (So your paragraphs are now marked).
3. Replace -^p with nothing. (This is to get rid of hyphens in words divided at the ends of broken lines. The ^p is removed too so a space isn't entered into the word in the next step)
4. Replace ^p with a single space. (This will remove all line divisions that aren't between paragraphs. It needs to be a space rather than nothing to keep words on successive lines from running together.)
5. Replace GGGGG with ^p^p to put a blank line between paragraphs. If you don't like blank lines between your paragraphs replace GGGGG with ^p instead.
6. Replace two spaces with one space. (This will solve some extra spaces problems caused by step 3.)


By and large, this will give you well formatted text for the main body of a book. There is a very good chance it will goof up title pages, tables of contents, formatted poems in the text and stuff like that. If I'm worried about such things, I'll go through and fix them by hand afterwords - usually I'm not.

This won't turn straw into gold. If the original file you are working with is just a runon mess, there might be no salvaging it other than lots and lots of editing by hand.

BTW, if anyone sees anything wrong with this (I'm doing it from memory, though I've done it enough times my memory should do it correct me here before people are led astray

Last edited by Andurian; 12-10-2008 at 01:11 PM.
Andurian is offline   Reply With Quote
 
Enthusiast
Old 12-10-2008, 01:58 PM   #3
daesdaemar
Addict
daesdaemar has a spectacular aura aboutdaesdaemar has a spectacular aura aboutdaesdaemar has a spectacular aura aboutdaesdaemar has a spectacular aura aboutdaesdaemar has a spectacular aura aboutdaesdaemar has a spectacular aura aboutdaesdaemar has a spectacular aura aboutdaesdaemar has a spectacular aura aboutdaesdaemar has a spectacular aura aboutdaesdaemar has a spectacular aura aboutdaesdaemar has a spectacular aura about
 
Posts: 210
Karma: 4282
Join Date: Oct 2008
Location: Florida
Device: Sony 505, Kindle 3, iPad 3
Quote:
Originally Posted by Andurian View Post
A couple of questions about your conversion errors:

1. What error message, exactly, are you getting?
2. Do the rtf and html files load properly in other applications?
3. Are the files unusually sized or named?

My formatting method is below - I use MS Word because I know how to access the whitespace characters in it via search. ^p is the paragraph mark in word.

1. Replace ^p^p^p with ^p^p. Repeat until there are no hits. (Removes excess blank lines).
2. Determine what marks off paragraphs. Usually it will be either ^p^p, a tab or multiple spaces to indent. Replace that with GGGGG. (So your paragraphs are now marked).
3. Replace -^p with nothing. (This is to get rid of hyphens in words divided at the ends of broken lines. The ^p is removed too so a space isn't entered into the word in the next step)
4. Replace ^p with a single space. (This will remove all line divisions that aren't between paragraphs. It needs to be a space rather than nothing to keep words on successive lines from running together.)
5. Replace GGGGG with ^p^p to put a blank line between paragraphs. If you don't like blank lines between your paragraphs replace GGGGG with ^p instead.
6. Replace two spaces with one space. (This will solve some extra spaces problems caused by step 3.)


By and large, this will give you well formatted text for the main body of a book. There is a very good chance it will goof up title pages, tables of contents, formatted poems in the text and stuff like that. If I'm worried about such things, I'll go through and fix them by hand afterwords - usually I'm not.

This won't turn straw into gold. If the original file you are working with is just a runon mess, there might be no salvaging it other than lots and lots of editing by hand.

BTW, if anyone sees anything wrong with this (I'm doing it from memory, though I've done it enough times my memory should do it correct me here before people are led astray
I will experiment with your method tonight. Also, when you are done with the above, in what format do you save the file in MS Word before converting in Calibre to lrf?
daesdaemar is offline   Reply With Quote
Old 12-10-2008, 02:44 PM   #4
Amalthia
Guru
Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.
 
Amalthia's Avatar
 
Posts: 951
Karma: 1960
Join Date: Jan 2007
Location: Anchorage, AK
Device: Sony Reader PRS-505, PRS-650
Andurian, you seem like you know your way around formatting documents. Do you by any chance have any advice on what to do if you have a text file and every line has a pargraph marker? I opened it in Word 2007 and at the end of every line there is a paragraph marker.
Amalthia is offline   Reply With Quote
Old 12-10-2008, 03:49 PM   #5
Andurian
You really should try it!
Andurian doesn't litterAndurian doesn't litter
 
Posts: 57
Karma: 137
Join Date: Nov 2008
Device: PRS-500
Quote:
Originally Posted by Amalthia View Post
Andurian, you seem like you know your way around formatting documents. Do you by any chance have any advice on what to do if you have a text file and every line has a pargraph marker? I opened it in Word 2007 and at the end of every line there is a paragraph marker.
Actually, the above is meant to deal with exactly that problem.

And I always save in RTF for Calibre to convert. It seems to handle RTF well.
Andurian is offline   Reply With Quote
Old 12-10-2008, 04:00 PM   #6
Amalthia
Guru
Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.
 
Amalthia's Avatar
 
Posts: 951
Karma: 1960
Join Date: Jan 2007
Location: Anchorage, AK
Device: Sony Reader PRS-505, PRS-650
Quote:
Originally Posted by Andurian View Post
Actually, the above is meant to deal with exactly that problem.

And I always save in RTF for Calibre to convert. It seems to handle RTF well.
okay will try it out but not sure how it's going to work since the file I have seems to think every line is a paragraph.
Amalthia is offline   Reply With Quote
Old 12-10-2008, 04:02 PM   #7
Amalthia
Guru
Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.
 
Amalthia's Avatar
 
Posts: 951
Karma: 1960
Join Date: Jan 2007
Location: Anchorage, AK
Device: Sony Reader PRS-505, PRS-650
p.s.

Quote:
Originally Posted by Andurian View Post
Actually, the above is meant to deal with exactly that problem.

And I always save in RTF for Calibre to convert. It seems to handle RTF well.

the problem is there are no indents on this file. So I'm not sure how to mark off where the paragraph starts. I thought maybe the last sentence because sometimes there is blank space afterwards...but I tried that already and it didn't work.

I used ^p insertblankwhitespacehere but it couldn't find it.
Amalthia is offline   Reply With Quote
Old 12-10-2008, 04:31 PM   #8
Andurian
You really should try it!
Andurian doesn't litterAndurian doesn't litter
 
Posts: 57
Karma: 137
Join Date: Nov 2008
Device: PRS-500
Quote:
Originally Posted by Amalthia View Post
the problem is there are no indents on this file. So I'm not sure how to mark off where the paragraph starts. I thought maybe the last sentence because sometimes there is blank space afterwards...but I tried that already and it didn't work.

I used ^p insertblankwhitespacehere but it couldn't find it.
You can just use ^p (those two characters) in the search and replace box in Word to deal with blank lines.

As you describe it, there is a ^p after each line. If there is a blank line between paragraphs there should be a ^p^p between paragraphs - the first being after the last line, the second being the one that creates the blank line between them.
Andurian is offline   Reply With Quote
Old 12-10-2008, 04:52 PM   #9
daesdaemar
Addict
daesdaemar has a spectacular aura aboutdaesdaemar has a spectacular aura aboutdaesdaemar has a spectacular aura aboutdaesdaemar has a spectacular aura aboutdaesdaemar has a spectacular aura aboutdaesdaemar has a spectacular aura aboutdaesdaemar has a spectacular aura aboutdaesdaemar has a spectacular aura aboutdaesdaemar has a spectacular aura aboutdaesdaemar has a spectacular aura aboutdaesdaemar has a spectacular aura about
 
Posts: 210
Karma: 4282
Join Date: Oct 2008
Location: Florida
Device: Sony 505, Kindle 3, iPad 3
OK, this is what I did with a txt file that I was having a lot of trouble with: I opened it in MS Word and saved it as rtf. Then with the rtf file I did the following (per Bob Russell's advice in a separate post)

To do that, I use MS Word mass replace (all occurances) as follows:
*) Mass replace ^p with <$$> (anything that's a distinct pattern works)
*) Mass replace <$$><$$> with ^p (to remove the double paragraph marks)
*) Mass replace <$$> with a space, to allow text to flow naturally
*) Mass replace ^l^l with ^l to remove double line breaks.

Saved as rtf and converted with Calibre. Perfect.
daesdaemar is offline   Reply With Quote
Old 12-10-2008, 05:32 PM   #10
Amalthia
Guru
Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.
 
Amalthia's Avatar
 
Posts: 951
Karma: 1960
Join Date: Jan 2007
Location: Anchorage, AK
Device: Sony Reader PRS-505, PRS-650
Quote:
Originally Posted by Andurian View Post
You can just use ^p (those two characters) in the search and replace box in Word to deal with blank lines.

As you describe it, there is a ^p after each line. If there is a blank line between paragraphs there should be a ^p^p between paragraphs - the first being after the last line, the second being the one that creates the blank line between them.
I think I forgot to mention there are no spaces between the paragraphs either. It's really confounded me because I've fixed text files before but not sure what to do since there is no indent, no paragraph breaks between paragraphs...

Here's an example.

"A mental sigh of relief reached him: Nikki's thought, Mik-
hyel's, or both; or perhaps just his own.
But it was a short-lived relief. At the gate, chaos reigned,
delivery vehicles jammed the opening, the silk balloons that
normally rose above them, taking the strain off the axles,
lay limp over the cargo or deflated even as they watched;
further evidence, if they needed it, that the node's power
umbrella was rapidly failing.
Or perhaps, Deymorin thought, as he raised his eyes to
see stormciouds gathering above the city, that energy was
being redirected."

In the file I have, after every line there is a ^p formatting mark.
Amalthia is offline   Reply With Quote
Old 12-10-2008, 06:15 PM   #11
Andurian
You really should try it!
Andurian doesn't litterAndurian doesn't litter
 
Posts: 57
Karma: 137
Join Date: Nov 2008
Device: PRS-500
Quote:
Originally Posted by Amalthia View Post
I think I forgot to mention there are no spaces between the paragraphs either. It's really confounded me because I've fixed text files before but not sure what to do since there is no indent, no paragraph breaks between paragraphs...

Here's an example.

"A mental sigh of relief reached him: Nikki's thought, Mik-
hyel's, or both; or perhaps just his own.
But it was a short-lived relief. At the gate, chaos reigned,
delivery vehicles jammed the opening, the silk balloons that
normally rose above them, taking the strain off the axles,
lay limp over the cargo or deflated even as they watched;
further evidence, if they needed it, that the node's power
umbrella was rapidly failing.
Or perhaps, Deymorin thought, as he raised his eyes to
see stormciouds gathering above the city, that energy was
being redirected."

In the file I have, after every line there is a ^p formatting mark.
Unfortunately, so far as I know the only way to deal with that is to manually go through and press return at the appropriate places.

Aside from writing a script that does what you do to determine where paragraphs end (look for short lines ending with sentence ending punctuation, basically) and add a ^p at that point...that would catch most of them, though it would likely also have a few false positives.

Anyone know whether there is a script like that out there somewhere?
Andurian is offline   Reply With Quote
Old 12-10-2008, 07:28 PM   #12
Amalthia
Guru
Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.
 
Amalthia's Avatar
 
Posts: 951
Karma: 1960
Join Date: Jan 2007
Location: Anchorage, AK
Device: Sony Reader PRS-505, PRS-650
Quote:
Originally Posted by Andurian View Post
Unfortunately, so far as I know the only way to deal with that is to manually go through and press return at the appropriate places.

Aside from writing a script that does what you do to determine where paragraphs end (look for short lines ending with sentence ending punctuation, basically) and add a ^p at that point...that would catch most of them, though it would likely also have a few false positives.

Anyone know whether there is a script like that out there somewhere?
Well, I figured when I retire and have no life and assuming no official version of these novels come out. I may actually sit down and go through and manually enter paragraph breaks myself. And then run the ^p^p thing...it's just 180k words...it's not a short novel at all. And there is a lot of dialogue.

I'm crossing my fingers for an offical release I can buy.

A script would be handy.
Amalthia is offline   Reply With Quote
Old 12-10-2008, 07:32 PM   #13
Elfwreck
Grand Sorcerer
Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.Elfwreck ought to be getting tired of karma fortunes by now.
 
Elfwreck's Avatar
 
Posts: 5,142
Karma: 24387938
Join Date: Nov 2008
Location: SF Bay Area, California, USA
Device: Clié; PRS-505; EZR Pocket Pro, PRS-600, Kobo Mini
Quote:
Originally Posted by Andurian View Post
Unfortunately, so far as I know the only way to deal with that is to manually go through and press return at the appropriate places.

Aside from writing a script that does what you do to determine where paragraphs end (look for short lines ending with sentence ending punctuation, basically) and add a ^p at that point...that would catch most of them, though it would likely also have a few false positives.

Anyone know whether there is a script like that out there somewhere?
No script, but I've had some luck using Word to replace period-return with period-return-return, and then add a return to the end of period-dblquote-return and period-space-return and period-doublequote-space-return. (.^p; ."^p, . ^p, ." ^p)

It's not 100% accurate, and you have to rotate through the rest of the end-of-sentence punctuation (question marks, exclamation points), but it gives a good start to work from--changes it from "add a return after every single paragraph" to "proofread for missing returns for quotes after a colon or emdash."
Elfwreck is offline   Reply With Quote
Old 12-10-2008, 08:06 PM   #14
Amalthia
Guru
Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.Amalthia once ate a cherry pie in a record 7 seconds.
 
Amalthia's Avatar
 
Posts: 951
Karma: 1960
Join Date: Jan 2007
Location: Anchorage, AK
Device: Sony Reader PRS-505, PRS-650
Quote:
Originally Posted by Elfwreck View Post
No script, but I've had some luck using Word to replace period-return with period-return-return, and then add a return to the end of period-dblquote-return and period-space-return and period-doublequote-space-return. (.^p; ."^p, . ^p, ." ^p)

It's not 100% accurate, and you have to rotate through the rest of the end-of-sentence punctuation (question marks, exclamation points), but it gives a good start to work from--changes it from "add a return after every single paragraph" to "proofread for missing returns for quotes after a colon or emdash."
I think I get what you're saying.

So in replace you type in ."^p and replace it with an extra ."^p^p?
Amalthia is offline   Reply With Quote
Old 12-10-2008, 08:46 PM   #15
bambi211
Junior Member
bambi211 began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Dec 2008
Device: PDA
Text format is ok because it downloads much faster but the problem is you can't highlight it where you stopped.
bambi211 is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
conversion to txt or rtf makes empty file lunixer Calibre 10 08-25-2010 04:56 PM
inserting blank lines into rtf/txt/html errata Ectaco jetBook 7 07-10-2010 09:16 PM
HTML to TXT conversion alkr Calibre 3 10-02-2009 09:54 AM
[Old Thread] unable to convert ebooks(rtf, txt,lit,html,pdf) to lrf in calibre .4.131 jackdeth191 Calibre 9 05-02-2009 02:55 AM
PRS-500 New conversion method: txt->rst->html->lrf phrodod Sony Reader Dev Corner 7 09-13-2007 02:50 AM


All times are GMT -4. The time now is 04:22 AM.


MobileRead.com is a privately owned, operated and funded community.