MobileRead Forums - View Single Post

Nirf · 06-24-2010, 01:50 AM

Ok, so I followed the suggestions. Running pdftohtml on hello.pdf worked and produced a bunch of files, hello.html, hellos.html, hello_ind.html, and a zillion .png files for all the pages. However, I couldn't find any way to add the html file meaningfully as a book into calibre. I would choose hello.html, and next thing I know when the book is in the library, it shows up as a zip file, and there's no way to preview it. Very odd behavior.

Also, I let ebook-convert run for a long time this time, and here's what I eventually got (after there was a memory look so bad that everything was slowing down and I ended it the hard way)

1% Converting input to HTML...
InputFormatPlugin: PDF Input running
on /home/nir/Documents/Calibre Library/Malcolm Gladwell/Outliers_ The Story of Success (Little, Brown & Co; 2008) (3)/hello.pdf
pdftohtml log:

Parsing all content...
Initial parse failed:
Parsing file 'index.html' as HTML
Forcing index.html into XHTML namespace
Generating default TOC from spine...
34% Running transforms on ebook...
Merging user specified metadata...
Detecting structure...
Auto generated TOC with 0 entries.
Flattening CSS and remapping font sizes...
Source base font size is 12.00000pt
Cleaning up manifest...
Trimming unused files from manifest...
Creating LRF Output...
67% Creating LRF Output
Processing u'index.html'
Parsing HTML...
Converting to BBeB...
Terminated

These conversions are also taking huge amounts of time... the pdftohtml conversion took a very long time (a few minutes) and for the ebook-convert command to get to this point takes even longer. I didn't remember it taking even close to this long before.... what's going on?

06-24-2010, 01:50 AM	#5
Nirf Junior Member Posts: 5 Karma: 10 Join Date: Aug 2008 Device: PRS-505	Ok, so I followed the suggestions. Running pdftohtml on hello.pdf worked and produced a bunch of files, hello.html, hellos.html, hello_ind.html, and a zillion .png files for all the pages. However, I couldn't find any way to add the html file meaningfully as a book into calibre. I would choose hello.html, and next thing I know when the book is in the library, it shows up as a zip file, and there's no way to preview it. Very odd behavior. Also, I let ebook-convert run for a long time this time, and here's what I eventually got (after there was a memory look so bad that everything was slowing down and I ended it the hard way) 1% Converting input to HTML... InputFormatPlugin: PDF Input running on /home/nir/Documents/Calibre Library/Malcolm Gladwell/Outliers_ The Story of Success (Little, Brown & Co; 2008) (3)/hello.pdf pdftohtml log: Parsing all content... Initial parse failed: Parsing file 'index.html' as HTML Forcing index.html into XHTML namespace Generating default TOC from spine... 34% Running transforms on ebook... Merging user specified metadata... Detecting structure... Auto generated TOC with 0 entries. Flattening CSS and remapping font sizes... Source base font size is 12.00000pt Cleaning up manifest... Trimming unused files from manifest... Creating LRF Output... 67% Creating LRF Output Processing u'index.html' Parsing HTML... Converting to BBeB... Terminated These conversions are also taking huge amounts of time... the pdftohtml conversion took a very long time (a few minutes) and for the ebook-convert command to get to this point takes even longer. I didn't remember it taking even close to this long before.... what's going on?