View Full Version : Problem with TXT Input


Balorn
05-30-2010, 02:45 AM
A lot of the things I read were originally formatted for 80-column monospace displays. I've been playing with the Extra CSS section and trying to tweak things to look right, but looking in the debug directories I discovered a major issue with the design of the TXT input plugin:

The TXT input plugin always removes whitespace from the start of the line, and has no option to leave it there.

For many things this doesn't matter, but preformatted TXT files often rely on space at the start of the line for things to look right.

I did a quick test, and starting with this source text file:
This test.
is a
a is
test. This
I got this in the html file in the debug\input folder:
<p>This test.</p>
<p>is a</p>
<p>a is</p>
<p>test. This</p>
I discovered this when I found adding "white-space: pre;" in the Custom CSS section helped with mid-line spacing but still didn't indent properly.

There really needs to be an option for TXT input to not strip leading spaces, so we can have them be included in later steps of conversion.

mrmikel
05-30-2010, 07:02 AM
You might try converting to html and convert significant spaces to non breaking spaces and see if that helps. You can use sigil for editing if you are making epubs.

DoctorOhh
05-30-2010, 11:07 AM
The TXT input plugin always removes whitespace from the start of the line, and has no option to leave it there.

For many things this doesn't matter, but preformatted TXT files often rely on space at the start of the line for things to look right.

...~~~...

There really needs to be an option for TXT input to not strip leading spaces, so we can have them be included in later steps of conversion.

The best I can tell this is a limit of converting text to a reflowable ebook format. For an in depth discussion of the limits of converting text read this thread (http://www.mobileread.com/forums/showthread.php?t=75428) between nerys and Kovid (Calibre's creator).

user_none
05-30-2010, 11:38 AM
I got this in the html file in the debug\input folder:

<p>This test.</p>
<p>is a</p>
<p>a is</p>
<p>test. This</p>


This is actually a bug. The proper output should be:


<p>This test.</p>
<p>is a</p>
<p>a is</p>
<p>test. This</p>


This is because it is converting to HTML internally and those extra white spaces will be condensed into one when rendering.

Adding an option to maintain white space exactly is possible but I don't see much point to it because it defeats the purpose of creating a reflowable document.

asjogren
05-30-2010, 08:09 PM
On a related issue with TXT input, the Conversion Preference "treat each line as a paragraph" does not appear to work. The result was each chapter was a paragraph.

Circumvention: import into Word Processor (StarOffice Writer) and save as RFT. The Calibre conversion from RFT to MOBI was flawless.

Calibre 0.6.49

FatDog
05-30-2010, 09:20 PM
I'm struggling with similar issues.

I took someones advice and I am trying to put simple HTML tags into my .txt file to create .htm/.html.

To preserve spacing when I need it I can add PRE and /PRE tags around things that should NOT be flowed.

DoctorOhh
05-30-2010, 11:03 PM
To preserve spacing when I need it I can add PRE and /PRE tags around things that should NOT be flowed.

Be aware that using PRE tags in your book can cause print to run right off the reader if you zoom/change font sizes.

user_none
05-31-2010, 08:45 AM
On a related issue with TXT input, the Conversion Preference "treat each line as a paragraph" does not appear to work. The result was each chapter was a paragraph.

Unless you use markdown processing (which has very specific requirements for formatting), TXT input has no idea what a chapter is. It works exclusively on lines.

I'm struggling with similar issues.

I took someones advice and I am trying to put simple HTML tags into my .txt file to create .htm/.html.

To preserve spacing when I need it I can add PRE and /PRE tags around things that should NOT be flowed.

Pre tags are almost always a bad idea. HTML supports other layout methods such as div and tables.

asjogren
05-31-2010, 04:14 PM
Unless you use markdown processing (which has very specific requirements for formatting), TXT input has no idea what a chapter is. It works exclusively on lines.

At each Chapter break there were blank lines. And you are correct, the tool had no concept of chapters. The result was each logical chapter in the source TXT turned into a physical paragraph in the target.

This was just an FYI. I think in the future I will use a word processor to take TXT to RFT. And then use Calibre to convert RFT to MOBI or ePub