Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 06-23-2010, 11:25 AM   #1
Nirf
Junior Member
Nirf began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Aug 2008
Device: PRS-505
Converting PDF to HTML

Hello all,

while trying to convert PDF format to LRF format it goes through HTML as an intermediary (apparently), and it hangs on this phase. I'm running ubuntu 10.04, I tried both installing from repo and then trying the latest version by downloading off the website. I used the console command ebook-convert, and after a while I hit control C to end it. Here's what it looks like.

ebook-convert hello.pdf hello.lrf
1% Converting input to HTML...
InputFormatPlugin: PDF Input running
on /home/nir/Documents/Calibre Library/Malcolm Gladwell/Outliers_ The Story of Success (Little, Brown & Co; 2008) (3)/hello.pdf
^CTraceback (most recent call last):
File "/tmp/init.py", line 48, in <module>
File "/home/kovid/build/calibre/src/calibre/ebooks/conversion/cli.py", line 254, in main
File "/home/kovid/build/calibre/src/calibre/ebooks/conversion/plumber.py", line 815, in run
File "/home/kovid/build/calibre/src/calibre/customize/conversion.py", line 211, in __call__
File "/home/kovid/build/calibre/src/calibre/ebooks/pdf/input.py", line 50, in convert
File "/home/kovid/build/calibre/src/calibre/ebooks/pdf/pdftohtml.py", line 61, in pdftohtml
File "/usr/lib64/python2.6/subprocess.py", line 1157, in wait
pid, sts = os.waitpid(self.pid, os.WNOHANG)
KeyboardInterrupt

Any suggestions with this? As I said, I tried both the repo install and then the direct download install (after removing the repo) and had the same problem both times. I don't know where to go from here because it seems to be getting stuck in a generic python file, and its the correct version of python... Help appreciated!
Nirf is offline   Reply With Quote
Old 06-23-2010, 11:28 AM   #2
Nirf
Junior Member
Nirf began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Aug 2008
Device: PRS-505
Oh, quick PS, when I run calibre I get two "Link hasn't been detected!" messages although calibre still runs. This makes me think I may be missing some of the required packages, but this doesn't make much sense as I've a) checked most of the major ones and b) I had the same problem when I installed from repo, and the repo install should install all the required packages automatically. Also, when I hit convert, I get a pile more "link hasn't been detected" messages. Is there any kind of debug mode for calibre where I can check what libraries seem to be missing?
Nirf is offline   Reply With Quote
Old 06-23-2010, 01:17 PM   #3
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,843
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
pdftohtml (the program calibre uses to convert podf to html is hanging). You can try running it independently on the pdf file to see it works and then convert the resulting html.
kovidgoyal is offline   Reply With Quote
Old 06-23-2010, 01:51 PM   #4
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by Nirf View Post
Oh, quick PS, when I run calibre I get two "Link hasn't been detected!" messages although calibre still runs.
These are normal and can be ignored.
Starson17 is offline   Reply With Quote
Old 06-24-2010, 12:50 AM   #5
Nirf
Junior Member
Nirf began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Aug 2008
Device: PRS-505
Ok, so I followed the suggestions. Running pdftohtml on hello.pdf worked and produced a bunch of files, hello.html, hellos.html, hello_ind.html, and a zillion .png files for all the pages. However, I couldn't find any way to add the html file meaningfully as a book into calibre. I would choose hello.html, and next thing I know when the book is in the library, it shows up as a zip file, and there's no way to preview it. Very odd behavior.

Also, I let ebook-convert run for a long time this time, and here's what I eventually got (after there was a memory look so bad that everything was slowing down and I ended it the hard way)


1% Converting input to HTML...
InputFormatPlugin: PDF Input running
on /home/nir/Documents/Calibre Library/Malcolm Gladwell/Outliers_ The Story of Success (Little, Brown & Co; 2008) (3)/hello.pdf
pdftohtml log:

Parsing all content...
Initial parse failed:
Parsing file 'index.html' as HTML
Forcing index.html into XHTML namespace
Generating default TOC from spine...
34% Running transforms on ebook...
Merging user specified metadata...
Detecting structure...
Auto generated TOC with 0 entries.
Flattening CSS and remapping font sizes...
Source base font size is 12.00000pt
Cleaning up manifest...
Trimming unused files from manifest...
Creating LRF Output...
67% Creating LRF Output
Processing u'index.html'
Parsing HTML...
Converting to BBeB...
Terminated


These conversions are also taking huge amounts of time... the pdftohtml conversion took a very long time (a few minutes) and for the ebook-convert command to get to this point takes even longer. I didn't remember it taking even close to this long before.... what's going on?
Nirf is offline   Reply With Quote
Old 06-24-2010, 02:21 AM   #6
Nirf
Junior Member
Nirf began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Aug 2008
Device: PRS-505
A lot of evidence is suddenly pointing to the fact that this is very troublesome because the PDF in question is just a series of scanned images and doesn't contain text at all per se...
Nirf is offline   Reply With Quote
Old 06-24-2010, 08:47 AM   #7
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by Nirf View Post
I couldn't find any way to add the html file meaningfully as a book into calibre. I would choose hello.html, and next thing I know when the book is in the library, it shows up as a zip file,
HTML files are always added as zip files. This is normal behavior.

Quote:
and there's no way to preview it. Very odd behavior.
I'm not sure what you mean by "preview," but I can read my html files, just fine. Are you trying to read it?
Starson17 is offline   Reply With Quote
Old 06-24-2010, 08:51 AM   #8
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by Nirf View Post
A lot of evidence is suddenly pointing to the fact that this is very troublesome because the PDF in question is just a series of scanned images and doesn't contain text at all per se...
I have lots of pdf's like that - scanned images of the pages. If you think of them as what they are - images - they pretty much behave as expected for me.
Starson17 is offline   Reply With Quote
Reply

Tags
conversion, freeze, html, linux, pdf


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Content Converting HTML emails? shermozle Amazon Kindle 5 09-27-2010 10:03 PM
Converting Merged HTML file to Epub/PDF Not Working MV64 Calibre 1 06-07-2010 07:48 PM
Converting multiple HTML files into a single hyperlinked PDF? Jürgen Hubert Reading and Management 6 01-11-2010 07:44 AM
Converting from html mysweety Calibre 16 09-23-2009 08:20 AM
Converting HTML to Mobi? Sonist Calibre 5 02-10-2009 01:23 PM


All times are GMT -4. The time now is 05:18 PM.


MobileRead.com is a privately owned, operated and funded community.