MobileRead Forums

MobileRead Forums (https://www.mobileread.com/forums/index.php)
-   Calibre (https://www.mobileread.com/forums/forumdisplay.php?f=166)
-   -   Convert TXT to anything - simply wraps with < html > < body > ? (https://www.mobileread.com/forums/showthread.php?t=52737)

jmurphy 08-04-2009 07:22 PM

Convert TXT to anything - simply wraps with < html > < body > ?
 
Am I doing something wrong, or is converting from TXT extremely limited?

In trying to convert from TXT to ePub, for instance, the txt simply gets wrapped with html and body tags.

As a result, the ePub has absolutely no paragraphs.

Is this the expected behaviour?

My (unreasonable?) expection is that during the conversion to HTML, Calibre would convert hard-returns (or double hard-returns) to html paragraphs.

What I'd really like to see is roughly round-tripping : Take an ePub, save the content as text, run that text through Calibre and have an ePub that at least resembles the original. In this scenario, Calibre would process the text looking for Keywords like "Chapter" etc, and at least add a header tag to them during the html conversion, before converting to ePub.

So, what should I expect from Calibre when converting a TXT file to another format?

John Murphy

user_none 08-04-2009 07:30 PM

Conversion with txt as the input format should have the text run though markdown and it should be creating paragraphs. It should also unwrap hard line broken paragraphs. Open a bug at http://calibre.kovidgoyal.net and attach the text file so I don't forget to look into it.

jmurphy 08-04-2009 08:08 PM

Quote:

Originally Posted by user_none (Post 543126)
It should also unwrap hard line broken paragraphs.

How does Calibre unwrap hardline broke paragraphs?

Or, more to the point: How does Calibre recognize paragraphs in TXT files? Is it expecting double spacing? The TXT file I tested with only had single spacing...

John

user_none 08-04-2009 08:41 PM

Quote:

Originally Posted by jmurphy (Post 543167)
How does Calibre unwrap hardline broke paragraphs?

Or, more to the point: How does Calibre recognize paragraphs in TXT files? Is it expecting double spacing? The TXT file I tested with only had single spacing...

John

It recognizes single spaced lines as the same paragraph. All paragraphs are borken by an empty line.

So:

Code:

line1 is right here.
line2  is part of the same paragraph as line1.

Code:

line 1 is it's own.

line2 is also it's own.

It almost sounds like the file is:

Code:

    para
    para
    para

In which case that would be interpreted as all one large paragraph.

jmurphy 08-06-2009 01:33 AM

Quote:

Originally Posted by user_none (Post 543245)
It recognizes single spaced lines as the same paragraph. All paragraphs are borken by an empty line.

So:

Code:

line1 is right here.
line2  is part of the same paragraph as line1.

Code:

line 1 is it's own.

line2 is also it's own.

It almost sounds like the file is:

Code:

    para
    para
    para

In which case that would be interpreted as all one large paragraph.



Not "almost like the file is". As I said, that is exactly the way the file is. Single spaced. Every native Windows based tool I've used uses a single hard return as a paragraph marker.

Is there a way to configure Calibre to recognize a single hard return as a paragraph marker?

Earlier in this thread you mentioned "markdown". What is "markdown"?

John

slantybard 08-06-2009 02:03 AM

Markdown:
http://daringfireball.net/projects/markdown/syntax

rogue_ronin 08-06-2009 03:19 AM

Quote:

Originally Posted by slantybard (Post 544955)


So, calibre is pre-processing text with markdown? That's cool. Good for folks to know, they might want to do a quick edit before adding to calibre.

m a r

user_none 08-06-2009 07:50 AM

Quote:

Originally Posted by jmurphy (Post 544940)
Not "almost like the file is". As I said, that is exactly the way the file is. Single spaced. Every native Windows based tool I've used uses a single hard return as a paragraph marker.

The majority of plain text files it was expected for people to use as are from Project Gutenberg. They all almost all line broken paragraphs with empty lines separating paragraphs. E.G. War and Peace.

Quote:

Originally Posted by jmurphy (Post 544940)
Is there a way to configure Calibre to recognize a single hard return as a paragraph marker?

Open a ticket requesting that as a feature so I don't forget about it.


All times are GMT -4. The time now is 08:19 PM.

Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.