View Single Post
Old 05-27-2011, 09:52 AM   #1
snarkophilus
Wannabe Connoisseur
snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.
 
Posts: 426
Karma: 2516674
Join Date: Apr 2011
Location: Geelong, Australia
Device: Kobo Libra 2, Kobo Aura 2, Sony PRS-T1, Sony PRS-350, Palm TX
Conversion adds indents not in original input

Hi folks,

I've had a search through the conversion sub-forum and couldn't see this mentioned. Please direct me elsewhere if that has been covered already.

I've got a simple html example where there is no indent in the first paragraph and indents for following paragraphs. When I enable debug output, I can see the input html basically unchanged in the input/ parsed/ and structure/ directories, but in the processed/ directory the html with the calibre class tags added per paragraph are all the indented tag.

Here's my html input:
Code:
<html>
  <head>
      <title>Test</title>
      <style type="text/css">
          p { margin: 0em; text-indent: 0 }
          p + p { margin: 0em; text-indent: 1.5em }
      </style>
  </head>
  <body>
    <p><span>"</span><span>S</span>ally."</p>
    <p>A mutter.</p>
    <br/>
    <p>"Wake up now, Sally."</p>
    <p>A louder mutter: <em>leeme lone.</em></p>
  </body>
</html>
which comes out looking somewhat like this in a browser:

Sally
A mutter.

"Wake up now, Sally."
A louder mutter: leeme lone.

In the processed/ directory we end up with this html:
Code:
<?xml version='1.0' encoding='utf-8'?>
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
      <title>Test</title>
      <meta content="http://www.w3.org/1999/xhtml; charset=utf-8" http-equiv="Content-Type"/><link href="stylesheet.css" type="text/css" rel="stylesheet"/><style type="text/css">
                @page { margin-bottom: 5.000000pt; margin-top: 5.000000pt; }</style></head>
  <body class="calibre">
    <p class="calibre1"><span>"</span><span>S</span>ally."</p>
    <p class="calibre1">A mutter.</p>
    <br class="calibre2"/>
    <p class="calibre1">"Wake up now, Sally."</p>
    <p class="calibre1">A louder mutter: <em class="calibre3">leeme lone.</em></p>
  </body>
</html>
and .calibre1 in the style sheet has "text-indent: 1.5em". This comes out looking somewhat like this in an epub (all lines indented):

Sally
A mutter.

"Wake up now, Sally."
A louder mutter: leeme lone.


Note that this behaviour isn't specific to using something like p+p to control indent. From a converted mobi, the input directory ends up with a html that just has a different class="calibre_xxx" tag for the initial paragraph (which in a browser is correctly zero indented) to the rest of the paragraphs (which are indented). I'm looking for a generic fix here - the example above is just the simplest I could create by hand.

Is there some setting I'm missing that controls this?

Should I also offer some sort of bonus if someone can identify the book from the first four lines?

Cheers,
Simon.
snarkophilus is offline   Reply With Quote