Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 05-27-2011, 10:52 AM   #1
snarkophilus
Wannabe Connoisseur
snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.
 
Posts: 242
Karma: 1009530
Join Date: Apr 2011
Location: Geelong, Australia
Device: Sony PRS-T1, Sony PRS-350, Palm TX
Conversion adds indents not in original input

Hi folks,

I've had a search through the conversion sub-forum and couldn't see this mentioned. Please direct me elsewhere if that has been covered already.

I've got a simple html example where there is no indent in the first paragraph and indents for following paragraphs. When I enable debug output, I can see the input html basically unchanged in the input/ parsed/ and structure/ directories, but in the processed/ directory the html with the calibre class tags added per paragraph are all the indented tag.

Here's my html input:
Code:
<html>
  <head>
      <title>Test</title>
      <style type="text/css">
          p { margin: 0em; text-indent: 0 }
          p + p { margin: 0em; text-indent: 1.5em }
      </style>
  </head>
  <body>
    <p><span>"</span><span>S</span>ally."</p>
    <p>A mutter.</p>
    <br/>
    <p>"Wake up now, Sally."</p>
    <p>A louder mutter: <em>leeme lone.</em></p>
  </body>
</html>
which comes out looking somewhat like this in a browser:

Sally
A mutter.

"Wake up now, Sally."
A louder mutter: leeme lone.

In the processed/ directory we end up with this html:
Code:
<?xml version='1.0' encoding='utf-8'?>
<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
      <title>Test</title>
      <meta content="http://www.w3.org/1999/xhtml; charset=utf-8" http-equiv="Content-Type"/><link href="stylesheet.css" type="text/css" rel="stylesheet"/><style type="text/css">
                @page { margin-bottom: 5.000000pt; margin-top: 5.000000pt; }</style></head>
  <body class="calibre">
    <p class="calibre1"><span>"</span><span>S</span>ally."</p>
    <p class="calibre1">A mutter.</p>
    <br class="calibre2"/>
    <p class="calibre1">"Wake up now, Sally."</p>
    <p class="calibre1">A louder mutter: <em class="calibre3">leeme lone.</em></p>
  </body>
</html>
and .calibre1 in the style sheet has "text-indent: 1.5em". This comes out looking somewhat like this in an epub (all lines indented):

Sally
A mutter.

"Wake up now, Sally."
A louder mutter: leeme lone.


Note that this behaviour isn't specific to using something like p+p to control indent. From a converted mobi, the input directory ends up with a html that just has a different class="calibre_xxx" tag for the initial paragraph (which in a browser is correctly zero indented) to the rest of the paragraphs (which are indented). I'm looking for a generic fix here - the example above is just the simplest I could create by hand.

Is there some setting I'm missing that controls this?

Should I also offer some sort of bonus if someone can identify the book from the first four lines?

Cheers,
Simon.
snarkophilus is offline   Reply With Quote
Old 05-27-2011, 12:43 PM   #2
jackie_w
Wizard
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 2,877
Karma: 4200035
Join Date: Sep 2009
Location: UK
Device: Sony PRS-350, PB360, Kobo Glo/AuraHD/Aura6"/AuraH2O
Hi Simon,

It works OK for me - attached a screencap of the zip-to-epub conversion. I copied the html directly from your post.

These are the conversion settings shown in the conversion job details. You could compare them against your own.

Spoiler:
Code:
Convert book 1 of 1 (Indent)
Processing archive...
Resolved conversion options
calibre version: 0.8.1
{'asciiize': False,
 'author_sort': None,
 'authors': None,
 'base_font_size': 12.0,
 'book_producer': None,
 'breadth_first': False,
 'change_justification': u'justify',
 'chapter': u"//*[name()='h2' or name()='h3']",
 'chapter_mark': u'pagebreak',
 'comments': None,
 'cover': None,
 'debug_pipeline': None,
 'dehyphenate': True,
 'delete_blank_paragraphs': True,
 'disable_font_rescaling': False,
 'dont_package': False,
 'dont_split_on_page_breaks': False,
 'enable_heuristics': False,
 'epub_flatten': False,
 'extra_css': None,
 'extract_to': None,
 'fix_indents': True,
 'flow_size': 260,
 'font_size_mapping': u'9, 10, 11, 12, 14, 16, 20, 33',
 'format_scene_breaks': True,
 'html_unwrap_factor': 0.4,
 'input_encoding': None,
 'input_profile': <calibre.customize.profiles.InputProfile object at 0x04619BF0>,
 'insert_blank_line': False,
 'insert_metadata': False,
 'isbn': None,
 'italicize_common_cases': True,
 'keep_ligatures': False,
 'language': None,
 'level1_toc': u'//h:h2',
 'level2_toc': u'//h:h3',
 'level3_toc': None,
 'line_height': 0.0,
 'linearize_tables': False,
 'margin_bottom': 5.0,
 'margin_left': 5.0,
 'margin_right': 5.0,
 'margin_top': 15.0,
 'markup_chapter_headings': True,
 'max_levels': 5,
 'max_toc_links': 0,
 'minimum_line_height': 0.0,
 'no_chapters_in_toc': False,
 'no_default_epub_cover': False,
 'no_inline_navbars': False,
 'no_svg_cover': False,
 'output_profile': <calibre.customize.profiles.SonyReaderOutput object at 0x04627050>,
 'page_breaks_before': u'/',
 'prefer_metadata_cover': False,
 'preserve_cover_aspect_ratio': False,
 'pretty_print': True,
 'pubdate': None,
 'publisher': None,
 'rating': None,
 'read_metadata_from_opf': u'c:\\docume~1\\jackies\\locals~1\\temp\\calibre_0.8.1_tmp_uqmxez\\calibre_0.8.1_mxv6oi.opf',
 'remove_fake_margins': False,
 'remove_first_image': False,
 'remove_paragraph_spacing': False,
 'remove_paragraph_spacing_indent_size': 1.5,
 'renumber_headings': True,
 'replace_scene_breaks': u'',
 'series': None,
 'series_index': None,
 'smarten_punctuation': True,
 'sr1_replace': None,
 'sr1_search': None,
 'sr2_replace': None,
 'sr2_search': None,
 'sr3_replace': None,
 'sr3_search': None,
 'tags': None,
 'timestamp': None,
 'title': None,
 'title_sort': None,
 'toc_filter': None,
 'toc_threshold': 6,
 'unwrap_lines': True,
 'use_auto_toc': False,
 'verbose': 2}
InputFormatPlugin: HTML Input running
on c:\docume~1\jackies\locals~1\temp\calibre_0.8.1_tmp_uqmxez\calibre_0.8.1_dauecr_plumber_archive\content.opf
Parsing all content...
Manifest item 'toc.ncx' not found
Parsing Indent.html ...
Generating default TOC from spine...
Merging user specified metadata...
Detecting structure...
Auto generated TOC with 0 entries.
Flattening CSS and remapping font sizes...
Source base font size is 12.00000pt
Cleaning up manifest...
Trimming unused files from manifest...
Parsing stylesheet.css ...
Creating EPUB Output...
	Looking for large trees in Indent.html...
	No large trees found
Generating default cover
This EPUB file has no Table of Contents. Creating a default TOC
EPUB output written to c:\docume~1\jackies\locals~1\temp\calibre_0.8.1_tmp_uqmxez\calibre_0.8.1_f4viod.epub


Edit: If I had to guess I would start by looking at the 'remove_paragraph_spacing' option on the Look&Feel page
Attached Thumbnails
Click image for larger version

Name:	indent.jpg
Views:	52
Size:	29.2 KB
ID:	71923  

Last edited by jackie_w; 05-27-2011 at 12:45 PM.
jackie_w is offline   Reply With Quote
Old 05-27-2011, 10:14 PM   #3
snarkophilus
Wannabe Connoisseur
snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.snarkophilus ought to be getting tired of karma fortunes by now.
 
Posts: 242
Karma: 1009530
Join Date: Apr 2011
Location: Geelong, Australia
Device: Sony PRS-T1, Sony PRS-350, Palm TX
Quote:
Originally Posted by jackie_w View Post
Edit: If I had to guess I would start by looking at the 'remove_paragraph_spacing' option on the Look&Feel page
Bingo!

For some reason that I can't recall, I had this option enabled. Unchecking that gives me exactly what I'm after.

Thank you,
Simon.
snarkophilus is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
What happens to the original file during conversion. domromer Calibre 7 11-14-2012 03:59 PM
Input formats for ePub conversion? llamedos Conversion 5 02-24-2011 02:55 PM
bulk conversion - set / confirm input format cybmole Conversion 3 02-23-2011 06:28 AM
Looking For MHT Input Conversion Plugin FlooseMan Dave Plugins 4 03-30-2010 06:52 PM
Calibre V0.6.13 adds support for LRF conversion Alexander Turcic Calibre 3 09-20-2009 01:27 PM


All times are GMT -4. The time now is 09:36 PM.


MobileRead.com is a privately owned, operated and funded community.