View Single Post
Old 01-23-2013, 12:11 PM   #1
ggnome403
Junior Member
ggnome403 began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Jan 2013
Device: Ipad 1
Issues convertin from pdf to epub

Hey everyone. I'm having an issue with calibre. Almost every file I convert I get the same issue. Calibre cuts sentences and words in half in random places and adds blank lines every line or two. I'm attaching the original file and the converted file. I have tried turning on Heuristic processing and checking the remove spacing between paragraph box as well using the default settings. I'm not sure what other setting I could change. Again it happens with pretty much every file I try. Probably 95% or more of them

Here is the conversion log
Spoiler:


Convert book 1 of 1 (The Philosophy of Humanism)
Resolved conversion options
calibre version: 0.9.15
{'asciiize': False,
'author_sort': None,
'authors': None,
'base_font_size': 0.0,
'book_producer': None,
'change_justification': u'original',
'chapter': u"//*[((name()='h1' or name()='h2') and re:test(., '\\s*((chapter|book|section|part)\\s+)|((prolog|pr ologue|epilogue)(\\s+|$))', 'i')) or @class = 'chapter']",
'chapter_mark': u'pagebreak',
'comments': None,
'cover': u'C:\\Users\\Shawn\\AppData\\Local\\Temp\\calibre_ 0.9.15_tmp_x1aibr\\mufjs6.jpeg',
'debug_pipeline': None,
'dehyphenate': True,
'delete_blank_paragraphs': True,
'disable_font_rescaling': False,
'dont_split_on_page_breaks': False,
'duplicate_links_in_toc': False,
'embed_font_family': None,
'enable_heuristics': False,
'epub_flatten': False,
'extra_css': None,
'extract_to': None,
'filter_css': u'',
'fix_indents': True,
'flow_size': 260,
'font_size_mapping': None,
'format_scene_breaks': True,
'html_unwrap_factor': 0.4,
'input_encoding': None,
'input_profile': <calibre.customize.profiles.InputProfile object at 0x0000000004AE12B0>,
'insert_blank_line': False,
'insert_blank_line_size': 0.5,
'insert_metadata': False,
'isbn': None,
'italicize_common_cases': True,
'keep_ligatures': False,
'language': None,
'level1_toc': None,
'level2_toc': None,
'level3_toc': None,
'line_height': 0.0,
'linearize_tables': False,
'margin_bottom': 5.0,
'margin_left': 5.0,
'margin_right': 5.0,
'margin_top': 5.0,
'markup_chapter_headings': True,
'max_toc_links': 50,
'minimum_line_height': 120.0,
'new_pdf_engine': False,
'no_chapters_in_toc': False,
'no_default_epub_cover': False,
'no_images': False,
'no_inline_navbars': False,
'no_svg_cover': False,
'output_profile': <calibre.customize.profiles.iPad3Output object at 0x0000000004AE1780>,
'page_breaks_before': u"//*[name()='h1' or name()='h2']",
'prefer_metadata_cover': False,
'preserve_cover_aspect_ratio': False,
'pretty_print': True,
'pubdate': None,
'publisher': None,
'rating': None,
'read_metadata_from_opf': u'C:\\Users\\Shawn\\AppData\\Local\\Temp\\calibre_ 0.9.15_tmp_x1aibr\\xopsb8.opf',
'remove_fake_margins': True,
'remove_first_image': False,
'remove_paragraph_spacing': False,
'remove_paragraph_spacing_indent_size': 1.5,
'renumber_headings': True,
'replace_scene_breaks': u'',
'search_replace': '[]',
'series': None,
'series_index': None,
'smarten_punctuation': False,
'sr1_replace': None,
'sr1_search': None,
'sr2_replace': None,
'sr2_search': None,
'sr3_replace': None,
'sr3_search': None,
'start_reading_at': None,
'subset_embedded_fonts': False,
'tags': None,
'timestamp': None,
'title': None,
'title_sort': None,
'toc_filter': None,
'toc_threshold': 6,
'unsmarten_punctuation': False,
'unwrap_factor': 0.45,
'unwrap_lines': True,
'use_auto_toc': False,
'verbose': 2}
InputFormatPlugin: PDF Input running
on C:\Users\Shawn\AppData\Local\Temp\calibre_0.9.15_t mp_x1aibr\kwkdjv.pdf
Converting file to html...
Flipping image index-3_1.png: y
Flipping image index-3_2.png: y
Retrieving document metadata...
Generating manifest...
Rendering manifest...
Parsing all content...
Parsing index.html ...
Initial parse failed, using more forgiving parsers
Parsing index.html as HTML
Generating default TOC from spine...
Merging user specified metadata...
Detecting structure...
Auto generated TOC with 42 entries.
Flattening CSS and remapping font sizes...
Source base font size is 12.00000pt
Removing fake margins...
Found 2377 items of level: p_2
Found 6691 items of level: p_1
p_2 left margin stats: Counter({u'0': 2377})
p_2 right margin stats: Counter({u'0': 2377})
p_1 left margin stats: Counter({u'0': 6691})
p_1 right margin stats: Counter({u'0': 6691})
Cleaning up manifest...
Trimming unused files from manifest...
Creating EPUB Output...
Splitting markup on page breaks and flow limits, if any...
Splitting on page-break
Looking for large trees in index.html...
Found large tree #0
Splitting...
Split point: {http://www.w3.org/1999/xhtml}p /*/*[2]/*[4536]
Split tree still too large: 539 KB
Splitting...
Split point: {http://www.w3.org/1999/xhtml}p /*/*[2]/*[2269]
Committed sub-tree #1 (258 KB)
Split tree still too large: 281 KB
Splitting...
Split point: {http://www.w3.org/1999/xhtml}p /*/*[2]/*[1135]
Committed sub-tree #2 (145 KB)
Committed sub-tree #3 (136 KB)
Split tree still too large: 420 KB
Splitting...
Split point: {http://www.w3.org/1999/xhtml}p /*/*[2]/*[2158]/*[111]
Split tree still too large: 298 KB
Splitting...
Split point: {http://www.w3.org/1999/xhtml}p /*/*[2]/*[1135]
Committed sub-tree #4 (148 KB)
Committed sub-tree #5 (150 KB)
Committed sub-tree #6 (122 KB)
Split into 7 parts
EPUB output written to C:\Users\Shawn\AppData\Local\Temp\calibre_0.9.15_t mp_x1aibr\9pcufr.epub


Any help would be appreciated.

Thanks
Shawn

Last edited by ggnome403; 01-23-2013 at 12:53 PM. Reason: Wrapped long paste in Spoiler
ggnome403 is offline   Reply With Quote