View Single Post
Old 01-18-2013, 10:51 AM   #1
Rich Gibson
Junior Member
Rich Gibson began at the beginning.
Posts: 4
Karma: 10
Join Date: Jan 2013
Device: Samsung Galaxy Tablet
Scannable pdf file loses data converting to mobi

Hi. I'm really enjoying Calibre. I've also posted this at the Facebook site but I see that this is a better place to ask my question.

I've run into a snag though. Up till now I've scanned paper documents creating a PDF, then used ABBYY to create a searchable PDF. Then I use PDF editor and remove the headers and footers. Finally I convert the document in Calibre to mobi. I do this so that I can play back the book using Ivova speech..omitting the voice reading the headers and footers. With my most recent book Calibre is somehow taking the original non-searchable PDF and restoring the headers and footers. The mobi file is the original non-searchable PDF file..but the accompanying .pdf file in the book folder is the header and footer edited PDF file. I even changed the names of the other versions and still Calibre produces a non-data PDF file with a .mobi extension. Any suggestions how Calibre takes a PDF which has no headers and imbedded text and outputs the original scanned .pdf file for a mobi? This process has run without flaw on many other books till now.

When I try to read the file (using Moon + Reader Pro) instead of the normal page-oriented presentation I see that it resembles a standard pdf file format. When I try to play the voice it speeds through the entire document indicating there is no scannable text. There are no error messages. I have the log and will post it below. Thanks for listening. I really like Calibre and have made a donation it's that useful.

'prefer_metadata_cover': False,
'pretty_print': False,
'pubdate': None,
'publisher': None,
'rating': None,
'read_metadata_from_opf': u'/var/folders/b1/htl47r2s79551pj5hxgbhmp00000gn/T/calibre_0.9.15_tmp_uytj6P/JkJWbC.opf',
'remove_fake_margins': True,
'remove_first_image': False,
'remove_paragraph_spacing': False,
'remove_paragraph_spacing_indent_size': 1.5,
'renumber_headings': True,
'replace_scene_breaks': u'',
'search_replace': '[]',
'series': None,
'series_index': None,
'share_not_sync': False,
'smarten_punctuation': True,
'sr1_replace': None,
'sr1_search': None,
'sr2_replace': None,
'sr2_search': None,
'sr3_replace': None,
'sr3_search': None,
'start_reading_at': None,
'subset_embedded_fonts': False,
'tags': None,
'timestamp': None,
'title': None,
'title_sort': None,
'toc_filter': None,
'toc_threshold': 6,
'toc_title': None,
'unsmarten_punctuation': False,
'unwrap_factor': 0.45,
'unwrap_lines': True,
'use_auto_toc': False,
'verbose': 2}
InputFormatPlugin: PDF Input running
on /var/folders/b1/htl47r2s79551pj5hxgbhmp00000gn/T/calibre_0.9.15_tmp_uytj6P/wkBpQ3.pdf
Converting file to html...
Retrieving document metadata...
Generating manifest...
Rendering manifest...
Parsing all content...
Parsing index.html ...
********* Heuristic processing HTML *********
flow is too short, not running heuristics
Initial parse failed, using more forgiving parsers
Parsing index.html as HTML
Generating default TOC from spine...
Merging user specified metadata...
Detecting structure...
Auto generated TOC with 0 entries.
Flattening CSS and remapping font sizes...
Source base font size is 12.00000pt
Removing fake margins...
Found 275 items of level: p_1
p_1 left margin stats: Counter({u'0': 275})
p_1 right margin stats: Counter({u'0': 275})
Cleaning up manifest...
Trimming unused files from manifest...
Creating MOBI Output...
Serializing resources...
Creating MOBI 6 output
Applying case-transforming CSS...
Parsing manglecase.css ...
Rasterizing SVG images...
Converting XHTML to Mobipocket markup...
Serializing markup content...
Compressing markup content...
No TOC, MOBI index not generated
MOBI output written to /var/folders/b1/htl47r2s79551pj5hxgbhmp00000gn/T/calibre_0.9.15_tmp_uytj6P/
Rich Gibson is offline   Reply With Quote