View Single Post
Old 01-23-2013, 12:47 PM   #3
theducks
Grand Sorcerer
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 15,219
Karma: 5940081
Join Date: Aug 2009
Location: (The original) Silicon Valley, USA
Device: Galaxy Tab 2, Astak Pocket Pro, K4NT
Quote:
Originally Posted by ggnome403 View Post
Hey everyone. I'm having an issue with calibre. Almost every file I convert I get the same issue. Calibre cuts sentences and words in half in random places and adds blank lines every line or two. I'm attaching the original file and the converted file. I have tried turning on Heuristic processing and checking the remove spacing between paragraph box as well using the default settings. I'm not sure what other setting I could change. Again it happens with pretty much every file I try. Probably 95% or more of them

Here is the conversion log
Spoiler:


Convert book 1 of 1 (The Philosophy of Humanism)
Resolved conversion options
calibre version: 0.9.15
{'asciiize': False,
'author_sort': None,
'authors': None,
'base_font_size': 0.0,
'book_producer': None,
'change_justification': u'original',
'chapter': u"//*[((name()='h1' or name()='h2') and re:test(., '\\s*((chapter|book|section|part)\\s+)|((prolog|pr ologue|epilogue)(\\s+|$))', 'i')) or @class = 'chapter']",
'chapter_mark': u'pagebreak',
'comments': None,
'cover': u'C:\\Users\\Shawn\\AppData\\Local\\Temp\\calibre_ 0.9.15_tmp_x1aibr\\mufjs6.jpeg',
'debug_pipeline': None,
'dehyphenate': True,
'delete_blank_paragraphs': True,
'disable_font_rescaling': False,
'dont_split_on_page_breaks': False,
'duplicate_links_in_toc': False,
'embed_font_family': None,
'enable_heuristics': False,
'epub_flatten': False,
'extra_css': None,
'extract_to': None,
'filter_css': u'',
'fix_indents': True,
'flow_size': 260,
'font_size_mapping': None,
'format_scene_breaks': True,
'html_unwrap_factor': 0.4,
'input_encoding': None,
'input_profile': <calibre.customize.profiles.InputProfile object at 0x0000000004AE12B0>,
'insert_blank_line': False,
'insert_blank_line_size': 0.5,
'insert_metadata': False,
'isbn': None,
'italicize_common_cases': True,
'keep_ligatures': False,
'language': None,
'level1_toc': None,
'level2_toc': None,
'level3_toc': None,
'line_height': 0.0,
'linearize_tables': False,
'margin_bottom': 5.0,
'margin_left': 5.0,
'margin_right': 5.0,
'margin_top': 5.0,
'markup_chapter_headings': True,
'max_toc_links': 50,
'minimum_line_height': 120.0,
'new_pdf_engine': False,
'no_chapters_in_toc': False,
'no_default_epub_cover': False,
'no_images': False,
'no_inline_navbars': False,
'no_svg_cover': False,
'output_profile': <calibre.customize.profiles.iPad3Output object at 0x0000000004AE1780>,
'page_breaks_before': u"//*[name()='h1' or name()='h2']",
'prefer_metadata_cover': False,
'preserve_cover_aspect_ratio': False,
'pretty_print': True,
'pubdate': None,
'publisher': None,
'rating': None,
'read_metadata_from_opf': u'C:\\Users\\Shawn\\AppData\\Local\\Temp\\calibre_ 0.9.15_tmp_x1aibr\\xopsb8.opf',
'remove_fake_margins': True,
'remove_first_image': False,
'remove_paragraph_spacing': False,
'remove_paragraph_spacing_indent_size': 1.5,
'renumber_headings': True,
'replace_scene_breaks': u'',
'search_replace': '[]',
'series': None,
'series_index': None,
'smarten_punctuation': False,
'sr1_replace': None,
'sr1_search': None,
'sr2_replace': None,
'sr2_search': None,
'sr3_replace': None,
'sr3_search': None,
'start_reading_at': None,
'subset_embedded_fonts': False,
'tags': None,
'timestamp': None,
'title': None,
'title_sort': None,
'toc_filter': None,
'toc_threshold': 6,
'unsmarten_punctuation': False,
'unwrap_factor': 0.45,
'unwrap_lines': True,
'use_auto_toc': False,
'verbose': 2}
InputFormatPlugin: PDF Input running
on C:\Users\Shawn\AppData\Local\Temp\calibre_0.9.15_t mp_x1aibr\kwkdjv.pdf
Converting file to html...
Flipping image index-3_1.png: y
Flipping image index-3_2.png: y
Retrieving document metadata...
Generating manifest...
Rendering manifest...
Parsing all content...
Parsing index.html ...
Initial parse failed, using more forgiving parsers
Parsing index.html as HTML
Generating default TOC from spine...
Merging user specified metadata...
Detecting structure...
Auto generated TOC with 42 entries.
Flattening CSS and remapping font sizes...
Source base font size is 12.00000pt
Removing fake margins...
Found 2377 items of level: p_2
Found 6691 items of level: p_1
p_2 left margin stats: Counter({u'0': 2377})
p_2 right margin stats: Counter({u'0': 2377})
p_1 left margin stats: Counter({u'0': 6691})
p_1 right margin stats: Counter({u'0': 6691})
Cleaning up manifest...
Trimming unused files from manifest...
Creating EPUB Output...
Splitting markup on page breaks and flow limits, if any...
Splitting on page-break
Looking for large trees in index.html...
Found large tree #0
Splitting...
Split point: {http://www.w3.org/1999/xhtml}p /*/*[2]/*[4536]
Split tree still too large: 539 KB
Splitting...
Split point: {http://www.w3.org/1999/xhtml}p /*/*[2]/*[2269]
Committed sub-tree #1 (258 KB)
Split tree still too large: 281 KB
Splitting...
Split point: {http://www.w3.org/1999/xhtml}p /*/*[2]/*[1135]
Committed sub-tree #2 (145 KB)
Committed sub-tree #3 (136 KB)
Split tree still too large: 420 KB
Splitting...
Split point: {http://www.w3.org/1999/xhtml}p /*/*[2]/*[2158]/*[111]
Split tree still too large: 298 KB
Splitting...
Split point: {http://www.w3.org/1999/xhtml}p /*/*[2]/*[1135]
Committed sub-tree #4 (148 KB)
Committed sub-tree #5 (150 KB)
Committed sub-tree #6 (122 KB)
Split into 7 parts
EPUB output written to C:\Users\Shawn\AppData\Local\Temp\calibre_0.9.15_t mp_x1aibr\9pcufr.epub


Any help would be appreciated.

Thanks
Shawn
Word of advice: You are treading on the edge of MR copyright policy with this work that IS licensed for fully intact distribution. (A Greenie may take it down, anyway).


Did you read the Sticky at the top, on PDF conversions?
Iffy , but playing with the word unwrap value (try lower) might help
theducks is offline   Reply With Quote