10-20-2009, 10:51 AM | #1 |
Enthusiast
Posts: 47
Karma: 30
Join Date: May 2009
Location: Austin, TX
Device: Kindle Paperwhite 2
|
TXT conversion to ePub or LRF - paragraph formatting
I used search to look for "indent", "paragraph", and "txt conversion" here, but didn't find exactly what I was looking for.
I'm seeing differences in the output when I txt files to ePub or LRF, and as a naive user of the conversion utilities in calibre, I don't see how to control what's happening easily. I don't really want to venture into pre-formatting the text as html if I'm missing some simple ways to control the conversion in calibre directly. I have a simple test file - the first three chapters of Pride & Prejudice in a text-file. There is no indent at the begining of each paragraph, but there is a blank line between paragraphs. There are extra blank lines before the new-chapter heading. Here's an example of the last three paragraphs in Chapter 1 leading to the first three paragraphs of Chapter 2: Code:
"It will be no use to us, if twenty such should come, since you will not visit them." "Depend upon it, my dear, that when there are twenty, I will visit them all." Mr. Bennet was so odd a mixture of quick parts, sarcastic humour, reserve, and caprice, that the experience of three-and-twenty years had been insufficient to make his wife understand his character. _Her_ mind was less difficult to develop. She was a woman of mean understanding, little information, and uncertain temper. When she was discontented, she fancied herself nervous. The business of her life was to get her daughters married; its solace was visiting and news. Chapter 2 Mr. Bennet was among the earliest of those who waited on Mr. Bingley. He had always intended to visit him, though to the last always assuring his wife that he should not go; and till the evening after the visit was paid she had no knowledge of it. It was then disclosed in the following manner. Observing his second daughter employed in trimming a hat, he suddenly addressed her with: "I hope Mr. Bingley will like it, Lizzy." "We are not <etc.> In a default conversion to LRF, paragraphs and chapters no longer have any blank lines between them, but an indent is added. The indent makes it somewhat readable, but the missing blank lines make it feel claustrophobic. I've got several questions about this simple example. (1) How do I get Chapter detection to work (so that a page break is added before the "Chapter <N>" line? (2) How do I control the addition of indentation to the ePub format if that's what I want? (3) How to I prevent the disappearance of the blank lines between paragraphs in LRF conversion it that's what I want? Thanks in advance for your patience with a newbie to conversion. |
10-20-2009, 05:18 PM | #2 |
Sigil & calibre developer
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
The default is to detected sections of text separated by a blank line as a paragraph. The only way to detect chapter headings it to mark them using markdown.
The easiest thing to do is to put a # before every chapter and set the --level1-toc to "//*[name()='h1']" to have a table of contents generated. |
Advert | |
|
10-21-2009, 10:20 AM | #3 |
Enthusiast
Posts: 47
Karma: 30
Join Date: May 2009
Location: Austin, TX
Device: Kindle Paperwhite 2
|
Thanks, user_none. I want to make sure I clearly understand your answer, so I'll paraphrase it: Despite the fact that calibre provides a "Structure Detection" section in its conversion menu, and within that Structure Detection submenu there is a long and cryptic string entitled "Detect chapters at (Xpath expression)", there is in fact no way to detect chapters in the simple text-file example I have given. I just want to make sure other folks agree with this statement, because it sure seems to me like that "Detect chapters..." string was intended to do something.
Anyone have any ideas about questions (2) & (3)? Adding paragraph indents to converted ePub and preventing blank-line-swallowing in LRF conversion? |
10-21-2009, 04:23 PM | #4 |
Grand Sorcerer
Posts: 6,212
Karma: 16534894
Join Date: Sep 2009
Location: UK
Device: Kobo: KA1, ClaraHD, Forma, Libra2, Clara2E. PocketBook: TouchHD3
|
Hi Zapped,
Re question (3) :- Did you try checking the "Insert Blank Line" box in Convert/Look-and-Feel? |
10-21-2009, 05:36 PM | #5 | ||
Sigil & calibre developer
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
Quote:
The only other way to detect chapters is for you to write a custom XPath expression for your book in question. What ever the XPath matches will be considered a chapter. The default is to check for h1 and h2 tags that have some kind of word like chapter in them. Hence the suggestion of marking it with markdown wherein your example would have the chapters detected with the default XPath expression. Quote:
|
||
Advert | |
|
10-23-2009, 11:24 AM | #6 |
Enthusiast
Posts: 47
Karma: 30
Join Date: May 2009
Location: Austin, TX
Device: Kindle Paperwhite 2
|
user_none, I'll investigate that CSS thread and/or learn more about those cryptic Xpath expressions.
Jackie_W - thanks! That simple checkbox put the blank lines back into my txt-->lrf conversion. I missed that menu item previously. |
10-23-2009, 05:06 PM | #7 |
Member
Posts: 11
Karma: 10
Join Date: Dec 2007
Device: Sony 505
|
I find that it works best to convert the txt to rtf rather than working with the txt. If you don't need a toc, I actually think the rtf is just about the best looking file on the 505, as is.
Last edited by goose61282; 10-23-2009 at 05:08 PM. Reason: spelling |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Preserving <br /> on epub -> txt conversion | billingd | Calibre | 1 | 08-11-2010 06:24 AM |
LRF to EPUB: Each line is a paragraph tag | wudaben | Calibre | 5 | 07-14-2010 07:04 PM |
Conversion: EPUB to TXT | Starson17 | Calibre | 11 | 05-29-2010 12:31 PM |
Quote marks not formatting in .TXT to .EPUB? | Sassyinkpen | Calibre | 11 | 10-07-2009 09:27 PM |
PRS-500 New conversion method: txt->rst->html->lrf | phrodod | Sony Reader Dev Corner | 7 | 09-13-2007 02:50 AM |