View Single Post
Old 08-18-2010, 07:26 PM   #14
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Quote:
Originally Posted by Jabby View Post
My original reply was to Kovid. After reading the documentation this statement made me wonder; "by default calibre only groups lines in the input document into paragraphs. The default is to assume one or more blank lines are a paragraph boundary:"
Kovid isn't weighing in very much because I'm the author and maintainer of TXT input. For TXT files paragraphs are the only reliable components that they can be broken down into.

Quote:
Originally Posted by Jabby View Post
Here is my quibble - why group lines into only paragraphs. Why not into paragraphs and single lines? It is certainly possible and then markdowns would only be necessary if you wanted to add other basic formatting.
There are two parts to this. The easy part is Markdown was chosen as the method for adding formatting to TXT files. It is easy, quick, and the markup even looks good when just viewing it as a standard text file. We have one all purpose formatting method that handles pretty much every case short of not using HTML. Adding other formatting methods that do the same thing is unnecessary.

You're describing doing the following and it won't work.

Quote:
Originally Posted by Jabby
CR Line terminator
LF Line terminator
CRLF Line terminator
CRCR Paragraph terminator
LFLF Paragraph terminator
CRLFCRLF Paragraph terminator
1) Many TXT documents (look at project Gutenberg) have this formatting:

Code:
I am all one
paragraph split
along multiple lines
with a single new
line character.
By your line ending description above it would turn into:

[code]
<p>
I am all one<br />
paragraph split<br />
along multiple lines<br />
with a single new<br />
line character.<br />
</p>

The whole point of TXT input is to take a fixed placement document and turn it into a reflowable format. calibre's conversion process actually requires this. Input -> reflowable intermediate format -> Output.

You've removed the entire idea of a reflowable paragraph that changes layout to fit with the page width but by doing the above. The TXT input is based on intent. Novels are the typical input and it is designed to handle the majority of their formatting cases. There is a "Treat each line as a paragraph" and Markdown to handle cases corner cases such as yours.

2) I'm not 100% clear but if you are implying that we allow for mixed CR/LF characters (aside from the standard Windows CRLF) within the document to denote different meaning? Such as LFLF for paragraph and CR for new line, No. CR and LF are invisible characters. They are all treated the same because that's how the majority of text editors treat them. Many uses will edit a file on Windows and then on say OS X. Some editors will convert all new lines to the system's standard and some insert their systems new line where indicated while still displaying correctly. Uses will become very confused when viewring their converted documents and different lines behave in different ways. Especially when they can't see there is a CR instead of an LF chracter in the source TXT file. In this case telling a user to open their document in a hex editor is not acceptable.
user_none is offline   Reply With Quote