Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Conversion

Notices

Reply
 
Thread Tools Search this Thread
Old 01-18-2013, 10:51 AM   #1
Rich Gibson
Junior Member
Rich Gibson began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Jan 2013
Device: Samsung Galaxy Tablet
Scannable pdf file loses data converting to mobi

Hi. I'm really enjoying Calibre. I've also posted this at the Facebook site but I see that this is a better place to ask my question.

I've run into a snag though. Up till now I've scanned paper documents creating a PDF, then used ABBYY to create a searchable PDF. Then I use PDF editor and remove the headers and footers. Finally I convert the document in Calibre to mobi. I do this so that I can play back the book using Ivova speech..omitting the voice reading the headers and footers. With my most recent book Calibre is somehow taking the original non-searchable PDF and restoring the headers and footers. The mobi file is the original non-searchable PDF file..but the accompanying .pdf file in the book folder is the header and footer edited PDF file. I even changed the names of the other versions and still Calibre produces a non-data PDF file with a .mobi extension. Any suggestions how Calibre takes a PDF which has no headers and imbedded text and outputs the original scanned .pdf file for a mobi? This process has run without flaw on many other books till now.

When I try to read the file (using Moon + Reader Pro) instead of the normal page-oriented presentation I see that it resembles a standard pdf file format. When I try to play the voice it speeds through the entire document indicating there is no scannable text. There are no error messages. I have the log and will post it below. Thanks for listening. I really like Calibre and have made a donation it's that useful.

'prefer_metadata_cover': False,
'pretty_print': False,
'pubdate': None,
'publisher': None,
'rating': None,
'read_metadata_from_opf': u'/var/folders/b1/htl47r2s79551pj5hxgbhmp00000gn/T/calibre_0.9.15_tmp_uytj6P/JkJWbC.opf',
'remove_fake_margins': True,
'remove_first_image': False,
'remove_paragraph_spacing': False,
'remove_paragraph_spacing_indent_size': 1.5,
'renumber_headings': True,
'replace_scene_breaks': u'',
'search_replace': '[]',
'series': None,
'series_index': None,
'share_not_sync': False,
'smarten_punctuation': True,
'sr1_replace': None,
'sr1_search': None,
'sr2_replace': None,
'sr2_search': None,
'sr3_replace': None,
'sr3_search': None,
'start_reading_at': None,
'subset_embedded_fonts': False,
'tags': None,
'timestamp': None,
'title': None,
'title_sort': None,
'toc_filter': None,
'toc_threshold': 6,
'toc_title': None,
'unsmarten_punctuation': False,
'unwrap_factor': 0.45,
'unwrap_lines': True,
'use_auto_toc': False,
'verbose': 2}
InputFormatPlugin: PDF Input running
on /var/folders/b1/htl47r2s79551pj5hxgbhmp00000gn/T/calibre_0.9.15_tmp_uytj6P/wkBpQ3.pdf
Converting file to html...
Retrieving document metadata...
Generating manifest...
Rendering manifest...
Parsing all content...
Parsing index.html ...
********* Heuristic processing HTML *********
flow is too short, not running heuristics
Initial parse failed, using more forgiving parsers
Parsing index.html as HTML
Generating default TOC from spine...
Merging user specified metadata...
Detecting structure...
Auto generated TOC with 0 entries.
Flattening CSS and remapping font sizes...
Source base font size is 12.00000pt
Removing fake margins...
Found 275 items of level: p_1
p_1 left margin stats: Counter({u'0': 275})
p_1 right margin stats: Counter({u'0': 275})
Cleaning up manifest...
Trimming unused files from manifest...
Creating MOBI Output...
Serializing resources...
Creating MOBI 6 output
Applying case-transforming CSS...
Parsing manglecase.css ...
Rasterizing SVG images...
Converting XHTML to Mobipocket markup...
Serializing markup content...
Compressing markup content...
No TOC, MOBI index not generated
MOBI output written to /var/folders/b1/htl47r2s79551pj5hxgbhmp00000gn/T/calibre_0.9.15_tmp_uytj6P/B0Bgha.mobi
Rich Gibson is offline   Reply With Quote
Old 01-18-2013, 11:05 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
This will be because your PDF is image based, i.e. it contains only scans of page images. I have no clue why your OCR process failed for this PDF.
kovidgoyal is offline   Reply With Quote
Old 01-18-2013, 02:05 PM   #3
Rich Gibson
Junior Member
Rich Gibson began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Jan 2013
Device: Samsung Galaxy Tablet
I believe I may know the reason. When I tried to get to reader app to play the document with voice it started spelling words instead of putting them together. A careful review using Apple's Preview shows the scanned print is significantly fainter than every other scan. The book does appear light to the eye as well.

I tried the pdf file on the tablet and it reads better...clearly the first scan wasn't good enough. I rescanned with a darker setting and tried a few pages and copied it to the tablet and the voice speaks the pdf file perfectly but somehow Calibre produces a mobi document without any text....but it IS in the pdf file before I convert it. Go figure!

Last edited by Rich Gibson; 01-18-2013 at 03:01 PM.
Rich Gibson is offline   Reply With Quote
Old 01-18-2013, 10:01 PM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
If you have adobe acrobat you can try extracting the text with that and then convert the resulting text file with calibre.
kovidgoyal is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
PDF -> MOBI Hangs at "Converting file to html..." JohnTRN Conversion 3 12-28-2012 01:10 PM
File Size when converting CBZ to mobi? Ito Conversion 2 05-09-2012 01:57 PM
Error converting pdf to mobi, and also chm to mobi Neo139 Conversion 10 08-12-2011 09:55 AM
Converting Mobi or HTML file to Epub Patuba Sigil 1 07-23-2011 04:14 PM
Converting Mobi or HTML file to Epub Patuba ePub 7 07-19-2011 12:11 PM


All times are GMT -4. The time now is 08:59 AM.


MobileRead.com is a privately owned, operated and funded community.