Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 05-27-2011, 09:52 AM   #1
snarkophilus
Wannabe Connoisseur
snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.
 
Posts: 426
Karma: 2516674
Join Date: Apr 2011
Location: Geelong, Australia
Device: Kobo Libra 2, Kobo Aura 2, Sony PRS-T1, Sony PRS-350, Palm TX
Conversion adds indents not in original input

Hi folks,

I've had a search through the conversion sub-forum and couldn't see this mentioned. Please direct me elsewhere if that has been covered already.

I've got a simple html example where there is no indent in the first paragraph and indents for following paragraphs. When I enable debug output, I can see the input html basically unchanged in the input/ parsed/ and structure/ directories, but in the processed/ directory the html with the calibre class tags added per paragraph are all the indented tag.

Here's my html input:
Code:
<html>
  <head>
      <title>Test</title>
      <style type="text/css">
          p { margin: 0em; text-indent: 0 }
          p + p { margin: 0em; text-indent: 1.5em }
      </style>
  </head>
  <body>
    <p><span>"</span><span>S</span>ally."</p>
    <p>A mutter.</p>
    <br/>
    <p>"Wake up now, Sally."</p>
    <p>A louder mutter: <em>leeme lone.</em></p>
  </body>
</html>
which comes out looking somewhat like this in a browser:

Sally
A mutter.

"Wake up now, Sally."
A louder mutter: leeme lone.

In the processed/ directory we end up with this html:
Code:
<?xml version='1.0' encoding='utf-8'?>
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
      <title>Test</title>
      <meta content="http://www.w3.org/1999/xhtml; charset=utf-8" http-equiv="Content-Type"/><link href="stylesheet.css" type="text/css" rel="stylesheet"/><style type="text/css">
                @page { margin-bottom: 5.000000pt; margin-top: 5.000000pt; }</style></head>
  <body class="calibre">
    <p class="calibre1"><span>"</span><span>S</span>ally."</p>
    <p class="calibre1">A mutter.</p>
    <br class="calibre2"/>
    <p class="calibre1">"Wake up now, Sally."</p>
    <p class="calibre1">A louder mutter: <em class="calibre3">leeme lone.</em></p>
  </body>
</html>
and .calibre1 in the style sheet has "text-indent: 1.5em". This comes out looking somewhat like this in an epub (all lines indented):

Sally
A mutter.

"Wake up now, Sally."
A louder mutter: leeme lone.


Note that this behaviour isn't specific to using something like p+p to control indent. From a converted mobi, the input directory ends up with a html that just has a different class="calibre_xxx" tag for the initial paragraph (which in a browser is correctly zero indented) to the rest of the paragraphs (which are indented). I'm looking for a generic fix here - the example above is just the simplest I could create by hand.

Is there some setting I'm missing that controls this?

Should I also offer some sort of bonus if someone can identify the book from the first four lines?

Cheers,
Simon.
snarkophilus is offline   Reply With Quote
Old 05-27-2011, 11:43 AM   #2
jackie_w
Grand Sorcerer
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 6,266
Karma: 16544702
Join Date: Sep 2009
Location: UK
Device: ClaraHD, Forma, Libra2, Clara2E, LibraCol, PBTouchHD3
Hi Simon,

It works OK for me - attached a screencap of the zip-to-epub conversion. I copied the html directly from your post.

These are the conversion settings shown in the conversion job details. You could compare them against your own.

Spoiler:
Code:
Convert book 1 of 1 (Indent)
Processing archive...
Resolved conversion options
calibre version: 0.8.1
{'asciiize': False,
 'author_sort': None,
 'authors': None,
 'base_font_size': 12.0,
 'book_producer': None,
 'breadth_first': False,
 'change_justification': u'justify',
 'chapter': u"//*[name()='h2' or name()='h3']",
 'chapter_mark': u'pagebreak',
 'comments': None,
 'cover': None,
 'debug_pipeline': None,
 'dehyphenate': True,
 'delete_blank_paragraphs': True,
 'disable_font_rescaling': False,
 'dont_package': False,
 'dont_split_on_page_breaks': False,
 'enable_heuristics': False,
 'epub_flatten': False,
 'extra_css': None,
 'extract_to': None,
 'fix_indents': True,
 'flow_size': 260,
 'font_size_mapping': u'9, 10, 11, 12, 14, 16, 20, 33',
 'format_scene_breaks': True,
 'html_unwrap_factor': 0.4,
 'input_encoding': None,
 'input_profile': <calibre.customize.profiles.InputProfile object at 0x04619BF0>,
 'insert_blank_line': False,
 'insert_metadata': False,
 'isbn': None,
 'italicize_common_cases': True,
 'keep_ligatures': False,
 'language': None,
 'level1_toc': u'//h:h2',
 'level2_toc': u'//h:h3',
 'level3_toc': None,
 'line_height': 0.0,
 'linearize_tables': False,
 'margin_bottom': 5.0,
 'margin_left': 5.0,
 'margin_right': 5.0,
 'margin_top': 15.0,
 'markup_chapter_headings': True,
 'max_levels': 5,
 'max_toc_links': 0,
 'minimum_line_height': 0.0,
 'no_chapters_in_toc': False,
 'no_default_epub_cover': False,
 'no_inline_navbars': False,
 'no_svg_cover': False,
 'output_profile': <calibre.customize.profiles.SonyReaderOutput object at 0x04627050>,
 'page_breaks_before': u'/',
 'prefer_metadata_cover': False,
 'preserve_cover_aspect_ratio': False,
 'pretty_print': True,
 'pubdate': None,
 'publisher': None,
 'rating': None,
 'read_metadata_from_opf': u'c:\\docume~1\\jackies\\locals~1\\temp\\calibre_0.8.1_tmp_uqmxez\\calibre_0.8.1_mxv6oi.opf',
 'remove_fake_margins': False,
 'remove_first_image': False,
 'remove_paragraph_spacing': False,
 'remove_paragraph_spacing_indent_size': 1.5,
 'renumber_headings': True,
 'replace_scene_breaks': u'',
 'series': None,
 'series_index': None,
 'smarten_punctuation': True,
 'sr1_replace': None,
 'sr1_search': None,
 'sr2_replace': None,
 'sr2_search': None,
 'sr3_replace': None,
 'sr3_search': None,
 'tags': None,
 'timestamp': None,
 'title': None,
 'title_sort': None,
 'toc_filter': None,
 'toc_threshold': 6,
 'unwrap_lines': True,
 'use_auto_toc': False,
 'verbose': 2}
InputFormatPlugin: HTML Input running
on c:\docume~1\jackies\locals~1\temp\calibre_0.8.1_tmp_uqmxez\calibre_0.8.1_dauecr_plumber_archive\content.opf
Parsing all content...
Manifest item 'toc.ncx' not found
Parsing Indent.html ...
Generating default TOC from spine...
Merging user specified metadata...
Detecting structure...
Auto generated TOC with 0 entries.
Flattening CSS and remapping font sizes...
Source base font size is 12.00000pt
Cleaning up manifest...
Trimming unused files from manifest...
Parsing stylesheet.css ...
Creating EPUB Output...
	Looking for large trees in Indent.html...
	No large trees found
Generating default cover
This EPUB file has no Table of Contents. Creating a default TOC
EPUB output written to c:\docume~1\jackies\locals~1\temp\calibre_0.8.1_tmp_uqmxez\calibre_0.8.1_f4viod.epub


Edit: If I had to guess I would start by looking at the 'remove_paragraph_spacing' option on the Look&Feel page
Attached Thumbnails
Click image for larger version

Name:	indent.jpg
Views:	265
Size:	29.2 KB
ID:	71923  

Last edited by jackie_w; 05-27-2011 at 11:45 AM.
jackie_w is offline   Reply With Quote
Advert
Old 05-27-2011, 09:14 PM   #3
snarkophilus
Wannabe Connoisseur
snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.
 
Posts: 426
Karma: 2516674
Join Date: Apr 2011
Location: Geelong, Australia
Device: Kobo Libra 2, Kobo Aura 2, Sony PRS-T1, Sony PRS-350, Palm TX
Quote:
Originally Posted by jackie_w View Post
Edit: If I had to guess I would start by looking at the 'remove_paragraph_spacing' option on the Look&Feel page
Bingo!

For some reason that I can't recall, I had this option enabled. Unchecking that gives me exactly what I'm after.

Thank you,
Simon.
snarkophilus is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
What happens to the original file during conversion. domromer Calibre 7 11-14-2012 02:59 PM
Input formats for ePub conversion? llamedos Conversion 5 02-24-2011 01:55 PM
bulk conversion - set / confirm input format cybmole Conversion 3 02-23-2011 05:28 AM
Looking For MHT Input Conversion Plugin FlooseMan Dave Plugins 4 03-30-2010 05:52 PM
Calibre V0.6.13 adds support for LRF conversion Alexander Turcic Calibre 3 09-20-2009 12:27 PM


All times are GMT -4. The time now is 06:47 AM.


MobileRead.com is a privately owned, operated and funded community.