Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 01-23-2013, 11:11 AM   #1
ggnome403
Junior Member
ggnome403 began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Jan 2013
Device: Ipad 1
Issues convertin from pdf to epub

Hey everyone. I'm having an issue with calibre. Almost every file I convert I get the same issue. Calibre cuts sentences and words in half in random places and adds blank lines every line or two. I'm attaching the original file and the converted file. I have tried turning on Heuristic processing and checking the remove spacing between paragraph box as well using the default settings. I'm not sure what other setting I could change. Again it happens with pretty much every file I try. Probably 95% or more of them

Here is the conversion log
Spoiler:


Convert book 1 of 1 (The Philosophy of Humanism)
Resolved conversion options
calibre version: 0.9.15
{'asciiize': False,
'author_sort': None,
'authors': None,
'base_font_size': 0.0,
'book_producer': None,
'change_justification': u'original',
'chapter': u"//*[((name()='h1' or name()='h2') and re:test(., '\\s*((chapter|book|section|part)\\s+)|((prolog|pr ologue|epilogue)(\\s+|$))', 'i')) or @class = 'chapter']",
'chapter_mark': u'pagebreak',
'comments': None,
'cover': u'C:\\Users\\Shawn\\AppData\\Local\\Temp\\calibre_ 0.9.15_tmp_x1aibr\\mufjs6.jpeg',
'debug_pipeline': None,
'dehyphenate': True,
'delete_blank_paragraphs': True,
'disable_font_rescaling': False,
'dont_split_on_page_breaks': False,
'duplicate_links_in_toc': False,
'embed_font_family': None,
'enable_heuristics': False,
'epub_flatten': False,
'extra_css': None,
'extract_to': None,
'filter_css': u'',
'fix_indents': True,
'flow_size': 260,
'font_size_mapping': None,
'format_scene_breaks': True,
'html_unwrap_factor': 0.4,
'input_encoding': None,
'input_profile': <calibre.customize.profiles.InputProfile object at 0x0000000004AE12B0>,
'insert_blank_line': False,
'insert_blank_line_size': 0.5,
'insert_metadata': False,
'isbn': None,
'italicize_common_cases': True,
'keep_ligatures': False,
'language': None,
'level1_toc': None,
'level2_toc': None,
'level3_toc': None,
'line_height': 0.0,
'linearize_tables': False,
'margin_bottom': 5.0,
'margin_left': 5.0,
'margin_right': 5.0,
'margin_top': 5.0,
'markup_chapter_headings': True,
'max_toc_links': 50,
'minimum_line_height': 120.0,
'new_pdf_engine': False,
'no_chapters_in_toc': False,
'no_default_epub_cover': False,
'no_images': False,
'no_inline_navbars': False,
'no_svg_cover': False,
'output_profile': <calibre.customize.profiles.iPad3Output object at 0x0000000004AE1780>,
'page_breaks_before': u"//*[name()='h1' or name()='h2']",
'prefer_metadata_cover': False,
'preserve_cover_aspect_ratio': False,
'pretty_print': True,
'pubdate': None,
'publisher': None,
'rating': None,
'read_metadata_from_opf': u'C:\\Users\\Shawn\\AppData\\Local\\Temp\\calibre_ 0.9.15_tmp_x1aibr\\xopsb8.opf',
'remove_fake_margins': True,
'remove_first_image': False,
'remove_paragraph_spacing': False,
'remove_paragraph_spacing_indent_size': 1.5,
'renumber_headings': True,
'replace_scene_breaks': u'',
'search_replace': '[]',
'series': None,
'series_index': None,
'smarten_punctuation': False,
'sr1_replace': None,
'sr1_search': None,
'sr2_replace': None,
'sr2_search': None,
'sr3_replace': None,
'sr3_search': None,
'start_reading_at': None,
'subset_embedded_fonts': False,
'tags': None,
'timestamp': None,
'title': None,
'title_sort': None,
'toc_filter': None,
'toc_threshold': 6,
'unsmarten_punctuation': False,
'unwrap_factor': 0.45,
'unwrap_lines': True,
'use_auto_toc': False,
'verbose': 2}
InputFormatPlugin: PDF Input running
on C:\Users\Shawn\AppData\Local\Temp\calibre_0.9.15_t mp_x1aibr\kwkdjv.pdf
Converting file to html...
Flipping image index-3_1.png: y
Flipping image index-3_2.png: y
Retrieving document metadata...
Generating manifest...
Rendering manifest...
Parsing all content...
Parsing index.html ...
Initial parse failed, using more forgiving parsers
Parsing index.html as HTML
Generating default TOC from spine...
Merging user specified metadata...
Detecting structure...
Auto generated TOC with 42 entries.
Flattening CSS and remapping font sizes...
Source base font size is 12.00000pt
Removing fake margins...
Found 2377 items of level: p_2
Found 6691 items of level: p_1
p_2 left margin stats: Counter({u'0': 2377})
p_2 right margin stats: Counter({u'0': 2377})
p_1 left margin stats: Counter({u'0': 6691})
p_1 right margin stats: Counter({u'0': 6691})
Cleaning up manifest...
Trimming unused files from manifest...
Creating EPUB Output...
Splitting markup on page breaks and flow limits, if any...
Splitting on page-break
Looking for large trees in index.html...
Found large tree #0
Splitting...
Split point: {http://www.w3.org/1999/xhtml}p /*/*[2]/*[4536]
Split tree still too large: 539 KB
Splitting...
Split point: {http://www.w3.org/1999/xhtml}p /*/*[2]/*[2269]
Committed sub-tree #1 (258 KB)
Split tree still too large: 281 KB
Splitting...
Split point: {http://www.w3.org/1999/xhtml}p /*/*[2]/*[1135]
Committed sub-tree #2 (145 KB)
Committed sub-tree #3 (136 KB)
Split tree still too large: 420 KB
Splitting...
Split point: {http://www.w3.org/1999/xhtml}p /*/*[2]/*[2158]/*[111]
Split tree still too large: 298 KB
Splitting...
Split point: {http://www.w3.org/1999/xhtml}p /*/*[2]/*[1135]
Committed sub-tree #4 (148 KB)
Committed sub-tree #5 (150 KB)
Committed sub-tree #6 (122 KB)
Split into 7 parts
EPUB output written to C:\Users\Shawn\AppData\Local\Temp\calibre_0.9.15_t mp_x1aibr\9pcufr.epub


Any help would be appreciated.

Thanks
Shawn

Last edited by ggnome403; 01-23-2013 at 11:53 AM. Reason: Wrapped long paste in Spoiler
ggnome403 is offline   Reply With Quote
Old 01-23-2013, 11:37 AM   #2
ggnome403
Junior Member
ggnome403 began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Jan 2013
Device: Ipad 1
I should also add, the only reason I want to have epub formats is to be able to highlight text from them for future reference as I cannot do that in Ibooks with a pdf.
ggnome403 is offline   Reply With Quote
Old 01-23-2013, 11:47 AM   #3
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,689
Karma: 54369090
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by ggnome403 View Post
Hey everyone. I'm having an issue with calibre. Almost every file I convert I get the same issue. Calibre cuts sentences and words in half in random places and adds blank lines every line or two. I'm attaching the original file and the converted file. I have tried turning on Heuristic processing and checking the remove spacing between paragraph box as well using the default settings. I'm not sure what other setting I could change. Again it happens with pretty much every file I try. Probably 95% or more of them

Here is the conversion log
Spoiler:


Convert book 1 of 1 (The Philosophy of Humanism)
Resolved conversion options
calibre version: 0.9.15
{'asciiize': False,
'author_sort': None,
'authors': None,
'base_font_size': 0.0,
'book_producer': None,
'change_justification': u'original',
'chapter': u"//*[((name()='h1' or name()='h2') and re:test(., '\\s*((chapter|book|section|part)\\s+)|((prolog|pr ologue|epilogue)(\\s+|$))', 'i')) or @class = 'chapter']",
'chapter_mark': u'pagebreak',
'comments': None,
'cover': u'C:\\Users\\Shawn\\AppData\\Local\\Temp\\calibre_ 0.9.15_tmp_x1aibr\\mufjs6.jpeg',
'debug_pipeline': None,
'dehyphenate': True,
'delete_blank_paragraphs': True,
'disable_font_rescaling': False,
'dont_split_on_page_breaks': False,
'duplicate_links_in_toc': False,
'embed_font_family': None,
'enable_heuristics': False,
'epub_flatten': False,
'extra_css': None,
'extract_to': None,
'filter_css': u'',
'fix_indents': True,
'flow_size': 260,
'font_size_mapping': None,
'format_scene_breaks': True,
'html_unwrap_factor': 0.4,
'input_encoding': None,
'input_profile': <calibre.customize.profiles.InputProfile object at 0x0000000004AE12B0>,
'insert_blank_line': False,
'insert_blank_line_size': 0.5,
'insert_metadata': False,
'isbn': None,
'italicize_common_cases': True,
'keep_ligatures': False,
'language': None,
'level1_toc': None,
'level2_toc': None,
'level3_toc': None,
'line_height': 0.0,
'linearize_tables': False,
'margin_bottom': 5.0,
'margin_left': 5.0,
'margin_right': 5.0,
'margin_top': 5.0,
'markup_chapter_headings': True,
'max_toc_links': 50,
'minimum_line_height': 120.0,
'new_pdf_engine': False,
'no_chapters_in_toc': False,
'no_default_epub_cover': False,
'no_images': False,
'no_inline_navbars': False,
'no_svg_cover': False,
'output_profile': <calibre.customize.profiles.iPad3Output object at 0x0000000004AE1780>,
'page_breaks_before': u"//*[name()='h1' or name()='h2']",
'prefer_metadata_cover': False,
'preserve_cover_aspect_ratio': False,
'pretty_print': True,
'pubdate': None,
'publisher': None,
'rating': None,
'read_metadata_from_opf': u'C:\\Users\\Shawn\\AppData\\Local\\Temp\\calibre_ 0.9.15_tmp_x1aibr\\xopsb8.opf',
'remove_fake_margins': True,
'remove_first_image': False,
'remove_paragraph_spacing': False,
'remove_paragraph_spacing_indent_size': 1.5,
'renumber_headings': True,
'replace_scene_breaks': u'',
'search_replace': '[]',
'series': None,
'series_index': None,
'smarten_punctuation': False,
'sr1_replace': None,
'sr1_search': None,
'sr2_replace': None,
'sr2_search': None,
'sr3_replace': None,
'sr3_search': None,
'start_reading_at': None,
'subset_embedded_fonts': False,
'tags': None,
'timestamp': None,
'title': None,
'title_sort': None,
'toc_filter': None,
'toc_threshold': 6,
'unsmarten_punctuation': False,
'unwrap_factor': 0.45,
'unwrap_lines': True,
'use_auto_toc': False,
'verbose': 2}
InputFormatPlugin: PDF Input running
on C:\Users\Shawn\AppData\Local\Temp\calibre_0.9.15_t mp_x1aibr\kwkdjv.pdf
Converting file to html...
Flipping image index-3_1.png: y
Flipping image index-3_2.png: y
Retrieving document metadata...
Generating manifest...
Rendering manifest...
Parsing all content...
Parsing index.html ...
Initial parse failed, using more forgiving parsers
Parsing index.html as HTML
Generating default TOC from spine...
Merging user specified metadata...
Detecting structure...
Auto generated TOC with 42 entries.
Flattening CSS and remapping font sizes...
Source base font size is 12.00000pt
Removing fake margins...
Found 2377 items of level: p_2
Found 6691 items of level: p_1
p_2 left margin stats: Counter({u'0': 2377})
p_2 right margin stats: Counter({u'0': 2377})
p_1 left margin stats: Counter({u'0': 6691})
p_1 right margin stats: Counter({u'0': 6691})
Cleaning up manifest...
Trimming unused files from manifest...
Creating EPUB Output...
Splitting markup on page breaks and flow limits, if any...
Splitting on page-break
Looking for large trees in index.html...
Found large tree #0
Splitting...
Split point: {http://www.w3.org/1999/xhtml}p /*/*[2]/*[4536]
Split tree still too large: 539 KB
Splitting...
Split point: {http://www.w3.org/1999/xhtml}p /*/*[2]/*[2269]
Committed sub-tree #1 (258 KB)
Split tree still too large: 281 KB
Splitting...
Split point: {http://www.w3.org/1999/xhtml}p /*/*[2]/*[1135]
Committed sub-tree #2 (145 KB)
Committed sub-tree #3 (136 KB)
Split tree still too large: 420 KB
Splitting...
Split point: {http://www.w3.org/1999/xhtml}p /*/*[2]/*[2158]/*[111]
Split tree still too large: 298 KB
Splitting...
Split point: {http://www.w3.org/1999/xhtml}p /*/*[2]/*[1135]
Committed sub-tree #4 (148 KB)
Committed sub-tree #5 (150 KB)
Committed sub-tree #6 (122 KB)
Split into 7 parts
EPUB output written to C:\Users\Shawn\AppData\Local\Temp\calibre_0.9.15_t mp_x1aibr\9pcufr.epub


Any help would be appreciated.

Thanks
Shawn
Word of advice: You are treading on the edge of MR copyright policy with this work that IS licensed for fully intact distribution. (A Greenie may take it down, anyway).


Did you read the Sticky at the top, on PDF conversions?
Iffy , but playing with the word unwrap value (try lower) might help
theducks is offline   Reply With Quote
Old 01-23-2013, 11:49 AM   #4
ggnome403
Junior Member
ggnome403 began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Jan 2013
Device: Ipad 1
My bad. Sorry. I own a paper version of the book and would like to have a digital copy on my ipad. Should I remove the file?
ggnome403 is offline   Reply With Quote
Old 01-23-2013, 11:53 AM   #5
ggnome403
Junior Member
ggnome403 began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Jan 2013
Device: Ipad 1
The files have been removed. Thanks
ggnome403 is offline   Reply With Quote
Old 01-23-2013, 12:20 PM   #6
ggnome403
Junior Member
ggnome403 began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Jan 2013
Device: Ipad 1
Is there anyway to convert a pdf to make the text highlight-able in Ibooks without changing anything else in the file??
ggnome403 is offline   Reply With Quote
Old 01-23-2013, 02:41 PM   #7
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,458
Karma: 26645808
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by ggnome403 View Post
Is there anyway to convert a pdf to make the text highlight-able in Ibooks without changing anything else in the file??
Print pdf to a pdf file sometimes results in an pdf you can annotate, highlight - no idea why only sometimes.

I use the Bullzip pdf printer driver, tried a couple of others they were no better in this regard than Bullzip.

For converting, using MobiCreator to go from pdf to prc, and then Calibre to go from prc to epub, often works best for me.

BR
BetterRed is offline   Reply With Quote
Old 01-23-2013, 05:13 PM   #8
ggnome403
Junior Member
ggnome403 began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Jan 2013
Device: Ipad 1
Quote:
Originally Posted by BetterRed View Post
Print pdf to a pdf file sometimes results in an pdf you can annotate, highlight - no idea why only sometimes.

I use the Bullzip pdf printer driver, tried a couple of others they were no better in this regard than Bullzip.

For converting, using MobiCreator to go from pdf to prc, and then Calibre to go from prc to epub, often works best for me.

BR
Thanks. I will give it a shot
ggnome403 is offline   Reply With Quote
Old 01-26-2013, 12:41 AM   #9
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 73,660
Karma: 127838196
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
If this is an image based (pages are images of pages and not text) then you cannot highlight. You can draw on the pages depending on the program.

Given the hassle of converting PDF, it would be in your best interest to buy the app GoodReader. It is the best app for dealing with PDF. It's much better then iBooks for dealing with PDF.
JSWolf is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
ePUB and PDF issues LED Kobo Reader 5 04-05-2011 10:26 AM
Conversion Issues - PDF/Word to Mobi/ePub MajC Conversion 1 02-14-2011 08:27 AM
Issues when convering CHM to PDF/EPUB akapulko2020 Calibre 4 06-27-2010 01:32 PM
Formatting issues when converting PDF to EPUB raptir Calibre 2 10-21-2009 10:32 PM
Issues on a particular PDF Hodapp87 HanLin eBook 0 03-22-2009 12:37 AM


All times are GMT -4. The time now is 12:56 AM.


MobileRead.com is a privately owned, operated and funded community.