06-23-2010, 11:25 AM | #1 |
Junior Member
Posts: 5
Karma: 10
Join Date: Aug 2008
Device: PRS-505
|
Converting PDF to HTML
Hello all,
while trying to convert PDF format to LRF format it goes through HTML as an intermediary (apparently), and it hangs on this phase. I'm running ubuntu 10.04, I tried both installing from repo and then trying the latest version by downloading off the website. I used the console command ebook-convert, and after a while I hit control C to end it. Here's what it looks like. ebook-convert hello.pdf hello.lrf 1% Converting input to HTML... InputFormatPlugin: PDF Input running on /home/nir/Documents/Calibre Library/Malcolm Gladwell/Outliers_ The Story of Success (Little, Brown & Co; 2008) (3)/hello.pdf ^CTraceback (most recent call last): File "/tmp/init.py", line 48, in <module> File "/home/kovid/build/calibre/src/calibre/ebooks/conversion/cli.py", line 254, in main File "/home/kovid/build/calibre/src/calibre/ebooks/conversion/plumber.py", line 815, in run File "/home/kovid/build/calibre/src/calibre/customize/conversion.py", line 211, in __call__ File "/home/kovid/build/calibre/src/calibre/ebooks/pdf/input.py", line 50, in convert File "/home/kovid/build/calibre/src/calibre/ebooks/pdf/pdftohtml.py", line 61, in pdftohtml File "/usr/lib64/python2.6/subprocess.py", line 1157, in wait pid, sts = os.waitpid(self.pid, os.WNOHANG) KeyboardInterrupt Any suggestions with this? As I said, I tried both the repo install and then the direct download install (after removing the repo) and had the same problem both times. I don't know where to go from here because it seems to be getting stuck in a generic python file, and its the correct version of python... Help appreciated! |
06-23-2010, 11:28 AM | #2 |
Junior Member
Posts: 5
Karma: 10
Join Date: Aug 2008
Device: PRS-505
|
Oh, quick PS, when I run calibre I get two "Link hasn't been detected!" messages although calibre still runs. This makes me think I may be missing some of the required packages, but this doesn't make much sense as I've a) checked most of the major ones and b) I had the same problem when I installed from repo, and the repo install should install all the required packages automatically. Also, when I hit convert, I get a pile more "link hasn't been detected" messages. Is there any kind of debug mode for calibre where I can check what libraries seem to be missing?
|
Advert | |
|
06-23-2010, 01:17 PM | #3 |
creator of calibre
Posts: 43,859
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
pdftohtml (the program calibre uses to convert podf to html is hanging). You can try running it independently on the pdf file to see it works and then convert the resulting html.
|
06-23-2010, 01:51 PM | #4 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
|
06-24-2010, 12:50 AM | #5 |
Junior Member
Posts: 5
Karma: 10
Join Date: Aug 2008
Device: PRS-505
|
Ok, so I followed the suggestions. Running pdftohtml on hello.pdf worked and produced a bunch of files, hello.html, hellos.html, hello_ind.html, and a zillion .png files for all the pages. However, I couldn't find any way to add the html file meaningfully as a book into calibre. I would choose hello.html, and next thing I know when the book is in the library, it shows up as a zip file, and there's no way to preview it. Very odd behavior.
Also, I let ebook-convert run for a long time this time, and here's what I eventually got (after there was a memory look so bad that everything was slowing down and I ended it the hard way) 1% Converting input to HTML... InputFormatPlugin: PDF Input running on /home/nir/Documents/Calibre Library/Malcolm Gladwell/Outliers_ The Story of Success (Little, Brown & Co; 2008) (3)/hello.pdf pdftohtml log: Parsing all content... Initial parse failed: Parsing file 'index.html' as HTML Forcing index.html into XHTML namespace Generating default TOC from spine... 34% Running transforms on ebook... Merging user specified metadata... Detecting structure... Auto generated TOC with 0 entries. Flattening CSS and remapping font sizes... Source base font size is 12.00000pt Cleaning up manifest... Trimming unused files from manifest... Creating LRF Output... 67% Creating LRF Output Processing u'index.html' Parsing HTML... Converting to BBeB... Terminated These conversions are also taking huge amounts of time... the pdftohtml conversion took a very long time (a few minutes) and for the ebook-convert command to get to this point takes even longer. I didn't remember it taking even close to this long before.... what's going on? |
Advert | |
|
06-24-2010, 02:21 AM | #6 |
Junior Member
Posts: 5
Karma: 10
Join Date: Aug 2008
Device: PRS-505
|
A lot of evidence is suddenly pointing to the fact that this is very troublesome because the PDF in question is just a series of scanned images and doesn't contain text at all per se...
|
06-24-2010, 08:47 AM | #7 | ||
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Quote:
|
||
06-24-2010, 08:51 AM | #8 |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
I have lots of pdf's like that - scanned images of the pages. If you think of them as what they are - images - they pretty much behave as expected for me.
|
Tags |
conversion, freeze, html, linux, pdf |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Content Converting HTML emails? | shermozle | Amazon Kindle | 5 | 09-27-2010 10:03 PM |
Converting Merged HTML file to Epub/PDF Not Working | MV64 | Calibre | 1 | 06-07-2010 07:48 PM |
Converting multiple HTML files into a single hyperlinked PDF? | Jürgen Hubert | Reading and Management | 6 | 01-11-2010 07:44 AM |
Converting from html | mysweety | Calibre | 16 | 09-23-2009 08:20 AM |
Converting HTML to Mobi? | Sonist | Calibre | 5 | 02-10-2009 01:23 PM |