Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 10-20-2009, 10:51 AM   #1
Zapped
Enthusiast
Zapped began at the beginning.
 
Zapped's Avatar
 
Posts: 45
Karma: 30
Join Date: May 2009
Location: Austin, TX
Device: Kindle Fire, B&N Nook Touch, Sony PRS-505
TXT conversion to ePub or LRF - paragraph formatting

I used search to look for "indent", "paragraph", and "txt conversion" here, but didn't find exactly what I was looking for.

I'm seeing differences in the output when I txt files to ePub or LRF, and as a naive user of the conversion utilities in calibre, I don't see how to control what's happening easily. I don't really want to venture into pre-formatting the text as html if I'm missing some simple ways to control the conversion in calibre directly.

I have a simple test file - the first three chapters of Pride & Prejudice in a text-file. There is no indent at the begining of each paragraph, but there is a blank line between paragraphs. There are extra blank lines before the new-chapter heading.

Here's an example of the last three paragraphs in Chapter 1 leading to the first three paragraphs of Chapter 2:
Code:
"It will be no use to us, if twenty such should come, since you will not
visit them."

"Depend upon it, my dear, that when there are twenty, I will visit them
all."

Mr. Bennet was so odd a mixture of quick parts, sarcastic humour,
reserve, and caprice, that the experience of three-and-twenty years had
been insufficient to make his wife understand his character. _Her_ mind
was less difficult to develop. She was a woman of mean understanding,
little information, and uncertain temper. When she was discontented,
she fancied herself nervous. The business of her life was to get her
daughters married; its solace was visiting and news.



Chapter 2


Mr. Bennet was among the earliest of those who waited on Mr. Bingley. He
had always intended to visit him, though to the last always assuring
his wife that he should not go; and till the evening after the visit was
paid she had no knowledge of it. It was then disclosed in the following
manner. Observing his second daughter employed in trimming a hat, he
suddenly addressed her with:

"I hope Mr. Bingley will like it, Lizzy."

"We are not <etc.>
In a default conversion to ePub, paragraphs remain separated by a blank line. No indent is added. Extra blank lines are converted to single blank line. It looks similar to the raw txt example above.

In a default conversion to LRF, paragraphs and chapters no longer have any blank lines between them, but an indent is added. The indent makes it somewhat readable, but the missing blank lines make it feel claustrophobic.

I've got several questions about this simple example.

(1) How do I get Chapter detection to work (so that a page break is added before the "Chapter <N>" line?

(2) How do I control the addition of indentation to the ePub format if that's what I want?

(3) How to I prevent the disappearance of the blank lines between paragraphs in LRF conversion it that's what I want?

Thanks in advance for your patience with a newbie to conversion.
Zapped is offline   Reply With Quote
Old 10-20-2009, 05:18 PM   #2
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,428
Karma: 950001
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
The default is to detected sections of text separated by a blank line as a paragraph. The only way to detect chapter headings it to mark them using markdown.

The easiest thing to do is to put a # before every chapter and set the --level1-toc to "//*[name()='h1']" to have a table of contents generated.
user_none is offline   Reply With Quote
 
Enthusiast
Old 10-21-2009, 10:20 AM   #3
Zapped
Enthusiast
Zapped began at the beginning.
 
Zapped's Avatar
 
Posts: 45
Karma: 30
Join Date: May 2009
Location: Austin, TX
Device: Kindle Fire, B&N Nook Touch, Sony PRS-505
Thanks, user_none. I want to make sure I clearly understand your answer, so I'll paraphrase it: Despite the fact that calibre provides a "Structure Detection" section in its conversion menu, and within that Structure Detection submenu there is a long and cryptic string entitled "Detect chapters at (Xpath expression)", there is in fact no way to detect chapters in the simple text-file example I have given. I just want to make sure other folks agree with this statement, because it sure seems to me like that "Detect chapters..." string was intended to do something.

Anyone have any ideas about questions (2) & (3)? Adding paragraph indents to converted ePub and preventing blank-line-swallowing in LRF conversion?
Zapped is offline   Reply With Quote
Old 10-21-2009, 04:23 PM   #4
jackie_w
Wizard
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 2,667
Karma: 3818025
Join Date: Sep 2009
Location: UK
Device: Sony PRS-350, PB360, Kobo Glo/AuraHD/Aura6"
Hi Zapped,
Re question (3) :-

Did you try checking the "Insert Blank Line" box in Convert/Look-and-Feel?
jackie_w is offline   Reply With Quote
Old 10-21-2009, 05:36 PM   #5
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,428
Karma: 950001
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Quote:
Originally Posted by Zapped View Post
Thanks, user_none. I want to make sure I clearly understand your answer, so I'll paraphrase it: Despite the fact that calibre provides a "Structure Detection" section in its conversion menu, and within that Structure Detection submenu there is a long and cryptic string entitled "Detect chapters at (Xpath expression)", there is in fact no way to detect chapters in the simple text-file example I have given. I just want to make sure other folks agree with this statement, because it sure seems to me like that "Detect chapters..." string was intended to do something.
TXT files can have any arbitray "layout" there is no way to properly detect each and every one. TXT input only detects paragraphs. Everything is a paragraph. Where one starts and stops can be tuned using a number of different options. The easiest way to differentiate things like chapter headings is to explicitly mark them using markdown.

The only other way to detect chapters is for you to write a custom XPath expression for your book in question. What ever the XPath matches will be considered a chapter. The default is to check for h1 and h2 tags that have some kind of word like chapter in them. Hence the suggestion of marking it with markdown wherein your example would have the chapters detected with the default XPath expression.

Quote:
Originally Posted by Zapped View Post
Anyone have any ideas about questions (2)... Adding paragraph indents to converted ePub...
You will have to add extra css. Have a look at the Custom CSS thread here for some ideas.
user_none is offline   Reply With Quote
Old 10-23-2009, 11:24 AM   #6
Zapped
Enthusiast
Zapped began at the beginning.
 
Zapped's Avatar
 
Posts: 45
Karma: 30
Join Date: May 2009
Location: Austin, TX
Device: Kindle Fire, B&N Nook Touch, Sony PRS-505
user_none, I'll investigate that CSS thread and/or learn more about those cryptic Xpath expressions.

Jackie_W - thanks! That simple checkbox put the blank lines back into my txt-->lrf conversion. I missed that menu item previously.
Zapped is offline   Reply With Quote
Old 10-23-2009, 05:06 PM   #7
goose61282
Member
goose61282 began at the beginning.
 
Posts: 11
Karma: 10
Join Date: Dec 2007
Device: Sony 505
I find that it works best to convert the txt to rtf rather than working with the txt. If you don't need a toc, I actually think the rtf is just about the best looking file on the 505, as is.

Last edited by goose61282; 10-23-2009 at 05:08 PM. Reason: spelling
goose61282 is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Preserving <br /> on epub -> txt conversion billingd Calibre 1 08-11-2010 06:24 AM
LRF to EPUB: Each line is a paragraph tag wudaben Calibre 5 07-14-2010 07:04 PM
Conversion: EPUB to TXT Starson17 Calibre 11 05-29-2010 12:31 PM
Quote marks not formatting in .TXT to .EPUB? Sassyinkpen Calibre 11 10-07-2009 09:27 PM
PRS-500 New conversion method: txt->rst->html->lrf phrodod Sony Reader Dev Corner 7 09-13-2007 02:50 AM


All times are GMT -4. The time now is 01:19 PM.


MobileRead.com is a privately owned, operated and funded community.