11-11-2010, 04:20 AM | #1 |
DRM hater
Posts: 945
Karma: 2066176
Join Date: Jun 2010
Location: Michigan
Device: Nook ST glow, Kindle Voyage
|
RTF to EPUB...extra line breaks
Hey guys
I've noticed this on a lot of RTF files and I haven't been able to figure it out. When I convert to EPUB, I get extra line breaks after each paragraph, that weren't in the original RTF. ¶ here stands for Paragraph break in the original RTF: Like this in the RTF: This is a paragraph¶ This is paragraph two. Comes out: This is a paragraph This is paragraph two. If I ask Calibre to "Remove spacing between paragraphs" under Look & Feel...it removes them. But it removes ALL breaks, then...including the ones that ARE supposed to be there. So if I have this in the RTF: This is a paragraph¶ ¶ This is paragraph two. I get: This is a paragraph. This is paragraph two. Any fix for this kind of behavior? I searched the forum...tried tweaking the Xpath detection stuff with a \ (even though RTF isn't HTML related). I did open the doc and turn on all formatting characters to make sure there weren't any extra breaks. Nope. Just (p). I seem to either get: spaces between all paragraphs, or, no space ever. What I don't get is why I'm getting blank lines between paragraphs at all - they aren't present in the original RTF. Last edited by GreenMonkey; 11-11-2010 at 05:30 AM. |
11-11-2010, 05:19 AM | #2 |
Wizard
Posts: 4,552
Karma: 950151
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
|
If you look at the HTML that is generated, I very much doubt if you will see any line breaks. What I expect you are seeing is paragraph breaks with a style (which is normal default for HTML) that specifies a paragraph break should add some white space.
Another point is that in HTML multiple consecutive paragraph breaks are typically treated as a single paragraph break. On that basis, the behaviour you describe is exactly what I would expect. It sounds as if in the original RTF file the author has tried to use blank lines to separate paragraphs? This is not abnormal in files where the author has mixed up the use of paragraph breaks to simply indicate end-of-line and also to indicate genuine paragraph breaks. What I am not sure from your message, is under what circumstances you want their to be space between paragraphs, and when you do not want this behavior? |
Advert | |
|
11-11-2010, 05:27 AM | #3 |
DRM hater
Posts: 945
Karma: 2066176
Join Date: Jun 2010
Location: Michigan
Device: Nook ST glow, Kindle Voyage
|
If there is a blank line, it should stay there.
If there is no blank line, there shouldn't be one. I moved to paragraph symbols for this post and updated the original post (ASCII/ANSI shortcuts FTW) So the behavior is basically...whenever a ¶ is there...I get a blank line from Calibre in the epub output. If I ask it to remove "Remove spacing between paragraphs" it removes the 'extra' ones...but of course, also the ones that were supposed to be there. I don't really get why I'm getting a blank line at every ¶ , even with that option to add it unchecked. Last edited by GreenMonkey; 11-11-2010 at 05:31 AM. |
11-15-2010, 12:02 AM | #4 |
DRM hater
Posts: 945
Karma: 2066176
Join Date: Jun 2010
Location: Michigan
Device: Nook ST glow, Kindle Voyage
|
Anybody else have any thoughts on this?
|
11-15-2010, 02:30 AM | #5 |
Wizard
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
|
I just tested this out - rtf seems to do something similar to txt input - extra blank lines appear to be deleted as part of the standard processing. As itimpi noted, this is normal. The use case where this is actually problematic would be for soft breaks (the only case where it occasionally bothers me). If the source document had soft breaks then Calibre deletes them for both rtf and text by default. For text you can tune this (preserve spaces), but not much can be done for rtf.
If your concern is actually because of softbreaks you have two options:
|
Advert | |
|
11-17-2010, 01:18 AM | #6 |
DRM hater
Posts: 945
Karma: 2066176
Join Date: Jun 2010
Location: Michigan
Device: Nook ST glow, Kindle Voyage
|
OK, thanks. Maybe I'll submit something. The RTF conversions are very good - so close - just this paragraph / line break issue remains.
|
11-17-2010, 08:21 AM | #7 |
Zealot
Posts: 122
Karma: 164
Join Date: Aug 2010
Location: Old Ynysybwl
Device: Sony PRS-300
|
I can replicate this with a simple file which I created from scratch the original RTF is
Title¶ By¶ Author¶ Is always being converted in epub to Title By Author I have not entered a bug report as somewhere here I thought I had seen that the RTF to XXXX module was not being maintained at present. (I hope I got that right as I can't find it right now!). Shame really becuase in my own work flow I like to convert from the initial document to RTF, run a series of macros which do all my correcting i.e. indents, ABC LIT Transformer yadayada removals etc and then convert to epub. Some conversion from TXT files to RTF lose all ¶'s on some occassions and not others without any reason I can discern. PS> if anyone wants to easily extract text from a PDF into a Word or RTF file install Foxit Reader. Select any word in the PDF with the text selection icon and then a 'CTRL+A' followed by 'CTRL+C' will select all and copy it ready for pasting in to whatever you want to edit it with etc. It makes the ABC Transformer stuff much easier to extract too. |
11-17-2010, 08:26 AM | #8 |
Addict
Posts: 219
Karma: 404
Join Date: Nov 2010
Device: Kindle 3G, Samsung SIII
|
You can use conversion to odt to avoid this and similar problems. If your macros require rtf then you can take source->rtf->macros->odt->destination path. I use doc->odt->destination and rtf->odt->destination regularly and it works well.
Last edited by janvanmaar; 11-17-2010 at 08:29 AM. |
11-17-2010, 09:56 AM | #9 |
Zealot
Posts: 122
Karma: 164
Join Date: Aug 2010
Location: Old Ynysybwl
Device: Sony PRS-300
|
I will try some odt - the macros run in word and my v2010 will load odt
|
11-17-2010, 09:59 AM | #10 |
Zealot
Posts: 122
Karma: 164
Join Date: Aug 2010
Location: Old Ynysybwl
Device: Sony PRS-300
|
ODT does not appear as an option in the conversion in my 0.7.28 Calibre, is there a downloadable plugin?
I can save the RTF as ODT but to get that converted back to epub there is no module? |
11-17-2010, 10:18 AM | #11 |
Addict
Posts: 219
Karma: 404
Join Date: Nov 2010
Device: Kindle 3G, Samsung SIII
|
That is weird. I did not do anything special, ODT input plugin just came with my Calibre (also 0.7.28) installation. I can simply add any ODT file to Calibre via the Add button and then convert to whatever.
Perhaps there is difference in plugins installed/enabled by default for different OS (I am on Linux)? You can check under Preferences->Conversion Input Plugins, whether ODT is there and green... Last edited by janvanmaar; 11-17-2010 at 10:21 AM. |
11-17-2010, 10:38 AM | #12 |
Zealot
Posts: 122
Karma: 164
Join Date: Aug 2010
Location: Old Ynysybwl
Device: Sony PRS-300
|
I can Add an ODT - not convert to. There is no ODT plugin in my setup, and I am using Win7
|
11-17-2010, 10:44 AM | #13 |
creator of calibre
Posts: 43,771
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
you need to save your RTF as ODT in openoffice and then convert from the ODT in calibre.
|
11-17-2010, 11:05 AM | #14 |
Addict
Posts: 219
Karma: 404
Join Date: Nov 2010
Device: Kindle 3G, Samsung SIII
|
Ah sorry, did not get your question. Exactly as Kovid says (of course)
|
11-17-2010, 11:48 AM | #15 |
Zealot
Posts: 122
Karma: 164
Join Date: Aug 2010
Location: Old Ynysybwl
Device: Sony PRS-300
|
OK, so, I can save from RTF in Word 2010 to ODT, add that as a file - merge with existing book to maintain Metadata and then convert. Tried it and it worked.
I think I will stick with the gaps though and save on the number of steps as I am converting many books at the moment. Good thread though learnt a lot as ususal, thanks everyone |
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Extra paragraph line when converting from LRF | jhempel24 | Calibre | 3 | 08-18-2010 07:00 AM |
Stripping extra line returns | jwhayn | Sony Reader | 3 | 02-27-2010 06:46 PM |
Odd line/paragraph breaks in epub and FB2? | PKFFW | Calibre | 4 | 10-01-2009 07:49 AM |
No line breaks | ecpepper | Amazon Kindle | 3 | 08-09-2009 06:42 PM |
Removing extra line breaks | plemming | Calibre | 0 | 07-31-2008 07:50 PM |