11-17-2012, 11:20 AM | #1 |
Connoisseur
Posts: 84
Karma: 335288
Join Date: Nov 2012
Device: Kindle
|
formatting problem when converting RTF to mobi
I frequently convert RTF files into mobi files. It's impossible for me to get them properly formatted. It appears the problem may be a Calibre design flaw, rather than a bug or user error:
1.) Calibre's default conversion for RTF to mobi files automatically removes all tabs and inserts spaces between the paragraphs. In other words, Calibre assumes you want your document formatted to look like a blog rather than a traditional book. 2.) Under "Look and Feel" you can check a box to remove line spacing between paragraphs. That will delete the spaces and re-insert the tabs. However... 3.) When Calibre re-inserts the tabs, it simply looks for any and all paragraph breaks and places a tab at those points. 4.) That messes up the formatting significantly because not all paragraph breaks should be followed by a tab. For example, any text you want centered on the page, such as a chapter title, ends up indented to the right of center. Another example: A paragraph starting a new section does not need to be indented. BOTTOM LINE: When you convert from RTF to mobi and want your document to look like a traditonal book, rather than a blog, Calibre makes it impossible for you to center items or have a non-indented paragraph after a section break. Caibre forces unwanted tabs throughout your document. DESIGN FLAW: When you check the box to remove line spacing you are not preventing Calibre from automatically re-formatting your document. In the background, Calibre still converts your document to blog format, but then turns around and tries to convert into the format specified by the user. In other words, checking that box simply causes Calibre to re-format your document twice -- first to its own preference, then subsequently to the user's preference. Calibre's default should be to keep the formating of the existing document and not re-format unless the user specifies. That would solve this problem. Anyone have any idea how I can work around this? Thanks. Last edited by A Lurker; 11-17-2012 at 11:51 AM. |
11-17-2012, 11:29 AM | #2 |
Connoisseur
Posts: 84
Karma: 335288
Join Date: Nov 2012
Device: Kindle
|
Example
The original and proper formatting: Chapter 1 It was a cold night. John was coming home from work when his car broke down. "This stinks." "Let's call Triple A," said Sarah. By Default Calibre would make it look like this: Chapter 1 It was a cold night. John was coming home from work when his car broke down. "This stinks." "Let's call Triple A," said Sarah. If I check the box "remove space between paragraphs" this is what I get (notice "Chapter 1" is slightly to the right, not centered): Chapter 1 It was a cold night. John was coming home from work when his car broke down. "This stinks." "Let's call Triple A," said Sarah. How do I work around this? Thanks. Last edited by A Lurker; 11-17-2012 at 11:51 AM. |
Advert | |
|
11-18-2012, 03:10 AM | #3 |
creator of calibre
Posts: 43,871
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Open your rtf file in word, save it as a webpage, filtered and convert the resulting HTML file in calibre.
|
11-24-2012, 02:30 PM | #4 |
Connoisseur
Posts: 84
Karma: 335288
Join Date: Nov 2012
Device: Kindle
|
Thank you so much for that suggestion, but it does NOT work.
I have tried converting the .rtf file to a Word .doc, then to a webpage-filtered, and then importing it into Calibre, but I experience the same exact problem: Calibre first inserts spaces between paragraphs and removes the tabs; then it reverses itself, deleting the spaces and re-inserts tabs. However Cailbre is "dumb" and sticks tabs after every paragraph mark. As a result, all centered text is pushed slightly to the right, and it's impossible to have any non-indented sections. In MS Word I tried replacing the paragraph marks with manual line breaks before the stuff I wanted Calibre to ignore when inserting tabs, but that did not work. Calibre also forced tabs after manual line breaks.Again, this problem happens regardless of whether I directly import the .rtf file or first convert it to webpage-filtered. I've experienced this problem for years, so it's happen with multiple versions of Calibre. |
11-24-2012, 04:25 PM | #5 | |||||||
US Navy, Retired
Posts: 9,864
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
|
Quote:
Second, checking the box selects everything on that line. This means not only does it remove line spacing between paragraphs it will add a indent to every paragraph depending on what you place in that section. See attached. Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Asked and answered, see attached. Last edited by DoctorOhh; 11-24-2012 at 04:36 PM. |
|||||||
Advert | |
|
11-25-2012, 08:31 AM | #6 |
Connoisseur
Posts: 84
Karma: 335288
Join Date: Nov 2012
Device: Kindle
|
DoctorOhh,
I appreciate your lengthy reply, but unfortunately most of what you wrote is incorrect. You seem to have misunderstood my problem and mischaracterized some of Calibre's functions. 1.) I am not trying to fine tune the editing in Calibre. Quite the opposite. I fine tune the editing in .rtf file. However, once I import it into Calibre, Calibre over-rides my formatting and tries to impose its own. (I described that format as "blog formatting". I don't claim that is an official term. It's my way of explaining that Calibre is deleting my tabs and inserting line spaces, so that the resulting text looks less like a conventional book and more like modern Internet formatting. You refer to it as XHTML.) 2.) You wrote Calibre "doesn't insert tabs". That is 100% incorrect. Calibre automatically deletes the original tabs in my document. When I check the box to "remove line spaces" it then tries to re-insert its own tabs. Unfortunately, Calibre is "dumb" and sticks a tab after every paragraph mark, even if there was no tab in the original. For example, Calibre inserts its own tabs in chapter titles, making them off-center. 3.) You wrote: "If you don't want your paragraphs to be indented or remain unchanged fix your indent value according to the tooltip on hover." As you noted earlier, this setting is for wholesale changes to the document, not fine tuning. Thus, if set the indent value to zero ALL of the tabs in the document disappear... But I don't want ALL of them to disappear. I want to keep the original tabs in the document, and delete the new tabs the Calibre inserted on its own. This cannot be done as your described. 4.) You wrote: "The blog format you refer to is XHTML and every conversion passes through this format en-route to its final format... As previously explained every conversion passes through the XHTML format regardless of what options you choose... Essentially that is calibre's default." Yes, I understand this. This makes it impossible to keep the orginal formatting of the original document. 5.) Perphas I'm not articulate enough to describe this with words, so please, please, please go back and see my second post where I use an actual example: https://www.mobileread.com/forums/sho...29&postcount=2 Here is what you will see: a.) If I simply convert the .rtf or Webpage-filtered file to .mobi without modifying any "Look and Feel preferences", then Calibre will delete all of my tabs and insert lines between my paragraph (i.e. XHTML). But I do not want that -- I want my text to look like a traditional book, not a blog. b.) So let's say I check the box to "remove line spaces". As you point out, Calibre will still begin by converting it to XHTML, inserting lines between paragraphs and deleting my tabs, but then it will subsequently reverse the process, deleting the line spaces and inserting its own tabs (which is what I meant when I said it's converting the document twice). Unfortunately, Calibre inserts its very own tabs after every paragrap mark, so that a chapter title, which should be centered, is now off-centered, pushed slightly to the right. Also certain paragraphs (such as those that begin a chapter) are then indented, which was not the case in the original format. c.) If I check the box to remove line spaces and reduce the indent value to zero, then I have one continuous document with no tabs and no spaces, where all the text runs together. It's unreadable If this still isn't clear, I can e-mail you short examples. Last edited by A Lurker; 11-25-2012 at 08:42 AM. |
11-25-2012, 09:32 AM | #7 |
Well trained by Cats
Posts: 29,818
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
A Lurker
You are confusing 'Tab' (character #0009) with line indent. Many word processors USE a Tab to accomplish a Line Indent. Some conversions use a series of nbsp's to fake/force a indent. The normal way is to set the style property: text-indent: <value>, with 0 being no indent |
11-25-2012, 10:34 AM | #8 | |
Connoisseur
Posts: 84
Karma: 335288
Join Date: Nov 2012
Device: Kindle
|
Quote:
DoctorOhh also suggested setting the indent value to 0, but that removes ALL indents in the text, so that no paragraphs are indented. Again, I don't want to remove ALL indents -- I want to keep the indents that appear in the original text and get rid of the indents that Calibre is forcing. |
|
11-25-2012, 10:38 AM | #9 |
Connoisseur
Posts: 84
Karma: 335288
Join Date: Nov 2012
Device: Kindle
|
For example:
Original text (note that the title is centered, the first paragraph is flush left, and the next two paragraphs are indented): Story The man ran.The cat ran. The mouse ran. Calibre's default formatting (note that it deletes my tabs and inserts spaces): Story The man ran. The cat ran. The mouse ran. If I select "remove spaces" this is the result (note that the title is off-center and the first paragraph is incorrectly indented): Story The man ran. The cat ran. The mouse ran. If I select "remove spaces" and set the indent to 0 this is the result (note that none of the paragraphs are indented, which is incorrect): Story The man ran. The cat ran. The mouse ran. As you can see, it's impossible to maintain the original formatting. Last edited by A Lurker; 11-25-2012 at 10:50 AM. |
11-25-2012, 11:00 AM | #10 |
Well trained by Cats
Posts: 29,818
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
You may have (partially) answered your own question.
You may have a Mix of (indent) styles that confuses Calibre. There are multiple setting places that can be used to deal with some of these during conversion. (from your description, you have been using OUTPUT options. There are 3 areas: Input, common and output. Each affects the conversion pipeline at some point You need to understand YOUR documents internal construction, then adjust each step as needed . spaces vs tab vs margin/indent change |
11-25-2012, 12:07 PM | #11 |
Grand Sorcerer
Posts: 6,212
Karma: 16534894
Join Date: Sep 2009
Location: UK
Device: Kobo: KA1, ClaraHD, Forma, Libra2, Clara2E. PocketBook: TouchHD3
|
@Lurker,
Are you using MSWord to create your RTFs? If so I think you'll get the best results if you use Word styles to create your indents - not tabs or non-breaking spaces. Personally I prefer to use MSWord doc to filtered html to epub if my source is a Word document, but MSWord to RTF to epub/mobi seems to work OK in the attached sample RTF (containing paragraph indent, non-indent, centred) when converted to epub or mobi. I also never use the calibre 'Remove spacing...' and 'Insert blank line...' options as I find them too much of a blunt instrument, causing more problems than they fix. I've also attached the converted mobi. [Added:] These were the conversion options used Spoiler:
Last edited by jackie_w; 11-25-2012 at 12:11 PM. Reason: added conversion options |
11-25-2012, 12:08 PM | #12 |
null operator (he/him)
Posts: 20,590
Karma: 26954694
Join Date: Mar 2012
Location: Sydney Australia
Device: none
|
A Lurker - format your document in your word processor using its paragraph format features, get rid of the tabs, extra paragraph marks etc.
I have 1,000s of documents that I've converted direct from RTF to EPUB and some MOBI with perfect results and I don't save as Formatted HTML - I never found any good reason to do that. I never think about a documents internal construction, or its conversion pipelines, how straight its building blocks are etc. I never think about the HTML, why would I do that, I want to read the content not the formatting gobbledegook. Essentially what I see in Word's Print View is what I see in the EPUB & MOBI Viewers I use - that's all I ask for. I do every conversion (PDF->PRC->RTF->EPUB & MOBI) with the same Calibre settings; its when I don't that things start to go wrong. My originals, mainly PDFs, a few ODT, DOCX & PPT, are sourced from orgs, govs, coms, edus, qangos, think tanks, consultants, EU, Congress, UN, WTO, Journals etc etc. The layouts vary widely, from simple A4 typed pages, through to variable width multi-column, with sidebars, embedded graphs and tables on A3 landscape wallpaper. The trick is let your wp software do what its best at - formatting. If you're using Word 2007/10 you should learn how to use Templates and Styles, they can save you heaps of effort and avoid a lot of grief. OOo Writer and Wordperfect will have something similar. Which is a long winded way of saying what jackie_w just said :lol: BR Last edited by BetterRed; 11-25-2012 at 12:13 PM. Reason: ack to jackie-w post |
11-25-2012, 02:01 PM | #13 | |||||||||
US Navy, Retired
Posts: 9,864
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
|
Quote:
One more time checking that box activates the Remove spacing between paragraphs / Indent size feature. This is one feature. Quote:
Quote:
The calibre manual which I linked you to describes it as XHTML. Did you bother reading the conversion section of the manual? Quote:
Quote:
Quote:
Quote:
Quote:
Quote:
Good Luck. Last edited by DoctorOhh; 11-25-2012 at 02:07 PM. |
|||||||||
11-25-2012, 04:37 PM | #14 |
Wizard
Posts: 4,552
Karma: 950151
Join Date: Nov 2008
Device: Sony PRS-950, iphone/ipad (Marvin/iBooks/QuickReader)
|
A point that has not been made anywhere is that in XHTML (which is what epub and mobi are based on) tab characters have no effect on the layout - they are simply whitespace and are treated like spaces. Also multiple whitespace characters (e.g. tab, space, newline) are condensed to a single space for display purposes. This is why you have to use styles or something equivalent to control layout.
|
11-26-2012, 11:01 PM | #15 |
Connoisseur
Posts: 84
Karma: 335288
Join Date: Nov 2012
Device: Kindle
|
@ theducks, jackie_w, BetterRed and itimpi:
Thank you all so much for your helpful tips! I've been busy with the holidays and haven't had a chance to test them, but they sound promising. You each seem to understand the problem I'm experiencing, so I'm confident your suggestions are pointing me in the right direction. @DoctorOhh: You seem to be stuck on my mis-use of some terminology and intent on arguing semantics. I wholeheartedly concede that I got some of the jargon wrong, but that does not alter the fact that the problem I describe is real. Bottom line: Calibre will not allow me to produce mobi files with the original rft/traditional formatting. And nothing you've suggested fixes that problem. You're pedantic and condescending focus on my layman's vocabulary is unhelpful. There were a few comments in your reply that suggest you might be starting to grasp my problem... but if I'm not mistaken, your response is basically, "Yeah, that's how Calibre works." In other words, you're merely acknowledging the limitations of Calibre without offering any possible work-arounds. That's also unhelpful. I'm thankful that other people in this thread were able to comprehend my problem offer suggestions, rather than lecture me. |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Problems converting rtf to mobi | Pierdzioszek | Conversion | 3 | 04-09-2012 03:04 PM |
RTF to Mobi to Kindle: Table Border and Formatting are lost | Starko | Calibre | 12 | 12-20-2010 05:46 AM |
Converting from RTF to MOBI drops graphics | Rimsky | Calibre | 2 | 10-11-2010 01:01 AM |
preserve table format when converting mobi to rtf | moogoogai | Calibre | 4 | 02-26-2010 12:50 PM |