Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Formats > Workshop

Notices

Reply
 
Thread Tools Search this Thread
Old 07-16-2009, 06:49 PM   #1
philpem
Junior Member
philpem began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Jul 2009
Device: Sony PRS505
Converting *big* multi-file HTML doc for PRS-505 reader

Hi guys,
I'm a software developer by trade, using various programming languages (mainly C, PHP and C++, but some others as well). I also have a notoriously bad memory, and tend to end up scrabbling through my collection of half a dozen or so quick-reference guides, and the documentation for the languages I use. So I figure, I have a PRS505 (Sony e-Reader), so why not put some of them on there...

Well, I've spent a good chunk of the last two days trying to convert the PHP documentation (the 5MB multi-file version with the table of contents) to LRF format so I can read it on the Reader. I started out by using Calibre, drag-dropping the index.html file onto the main window and converting it to LRF. This left me with a ~6-page LRF containing a great conversion of the TOC, but nothing else.

So I moved on a bit, and tried using the conversion utility (html2lrf) directly. I've done this on Windows XP (32-bit) and Ubuntu 8.10 (64-bit), in both cases with the "reduce memory usage" option turned on and with it off. In all cases, the conversion runs almost to the end ("rationalizing font sizes"), eats ~2GB of RAM, then dies -- Linux kills the converter off, Windows allows it to eat all the RAM it likes, then the OS freezes solid (but not after some impressive graphical effects, like window borders and buttons disappearing).

Does anyone know of any HTML-to-LRF converters that can handle documents as big as the PHP manuals, or any ways to make Calibre do this without eating so much RAM?

The PHP documentation source file I'm using is freely downloadable from http://uk3.php.net/download-docs.php -- I'm using "English, many files, .tar.gz"

Cheers,
Phil.
philpem is offline   Reply With Quote
Old 07-16-2009, 06:55 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,931
Karma: 5036099
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
convert it to epub
kovidgoyal is offline   Reply With Quote
Old 07-16-2009, 06:58 PM   #3
philpem
Junior Member
philpem began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Jul 2009
Device: Sony PRS505
Quote:
Originally Posted by kovidgoyal View Post
convert it to epub
I should have mentioned that I'd tried that -- it froze my machine almost solid after about 45 minutes, I ended up rebooting... That was on Linux; I didn't try it on Windows.
philpem is offline   Reply With Quote
Old 07-16-2009, 07:00 PM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,931
Karma: 5036099
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
You couldn't even ssh into it? really?
kovidgoyal is offline   Reply With Quote
Old 07-16-2009, 07:16 PM   #5
philpem
Junior Member
philpem began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Jul 2009
Device: Sony PRS505
Quote:
Originally Posted by kovidgoyal View Post
You couldn't even ssh into it? really?
I didn't try that, but X was frozen solid (mouse cursor wouldn't move) and numlock/capslock were unresponsive.

From past experience, if hitting numlock doesn't make the keyboard light blink, it's probably time to hit the Big Red Switch...
philpem is offline   Reply With Quote
Old 07-16-2009, 07:34 PM   #6
Nate the great
Sir Penguin of Edinburgh
Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.Nate the great ought to be getting tired of karma fortunes by now.
 
Nate the great's Avatar
 
Posts: 10,472
Karma: 3291603
Join Date: Apr 2007
Location: DC Metro area
Device: Shake a stick plus 1
I just had a look. Do you realize that it has over 9 _thousand_ html files? No wonder calibre crashed.
Nate the great is online now   Reply With Quote
Old 07-16-2009, 07:40 PM   #7
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,931
Karma: 5036099
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Quote:
Originally Posted by philpem View Post
I didn't try that, but X was frozen solid (mouse cursor wouldn't move) and numlock/capslock were unresponsive.

From past experience, if hitting numlock doesn't make the keyboard light blink, it's probably time to hit the Big Red Switch...
Oh I've often rescued machines that don't respond to any input device by sshing into them. calibre is designed to keep everything in memory while it converts, so your machine may be running out of memory.
kovidgoyal is offline   Reply With Quote
Old 07-16-2009, 07:45 PM   #8
philpem
Junior Member
philpem began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Jul 2009
Device: Sony PRS505
Quote:
Originally Posted by Nate the great View Post
I just had a look. Do you realize that it has over 9 _thousand_ html files? No wonder calibre crashed.
Yep, that's why I set the subject to "Converting *big* multi-file HTML doc" -- emphasis on "big".

Quote:
Originally Posted by kovidgoyal View Post
Oh I've often rescued machines that don't respond to any input device by sshing into them. calibre is designed to keep everything in memory while it converts, so your machine may be running out of memory.
It shouldn't be, there's 4GB for it to play with (and a Q6600 CPU overclocked to 3GHz, so plenty of CPU horsepower too)...

Though that said, the WXP version got to 2GB then died quite horribly, so maybe something similar is happening here. I'm not going to shout "bug!" because IMHO that's a bit like shouting "fire!" in a crowded theatre
philpem is offline   Reply With Quote
Old 07-16-2009, 07:51 PM   #9
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,931
Karma: 5036099
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
The whole conversion framework has been redesigned in calibre 0.6, try that (prefereably in linux) ans see it works. This time keep an eye on htop
kovidgoyal is offline   Reply With Quote
Old 07-17-2009, 01:47 PM   #10
philpem
Junior Member
philpem began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Jul 2009
Device: Sony PRS505
Quote:
Originally Posted by kovidgoyal View Post
The whole conversion framework has been redesigned in calibre 0.6, try that (prefereably in linux) ans see it works. This time keep an eye on htop
I guess v0.6 would be the current development version?

If so, the link on http://pypi.python.org/pypi/calibre/ (and in various places on the wiki) seems to be broken. My attempts to check out the source from Bzr got me this message:
Code:
philpem@cheetah:~/calibre$ bzr branch http://bzr.kovidgoyal.net/code/calibre/trunk calibre
bzr: ERROR: Connection error: Couldn't resolve host 'bzr.kovidgoyal.net' (-2, 'Name or service not known')
Am I screwing something up here (I normally use either Mercurial or Subversion, this is the first time I've used Bzr).
philpem is offline   Reply With Quote
Old 07-17-2009, 01:49 PM   #11
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,931
Karma: 5036099
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
bzr co lp:calibre

but there are links to precompiled beta releases of calibre in a sticky in this forum
kovidgoyal is offline   Reply With Quote
Old 07-17-2009, 05:00 PM   #12
philpem
Junior Member
philpem began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Jul 2009
Device: Sony PRS505
OK, seen the sticky. I'll have a play with that in a bit.
html2epub didn't like it though:

Code:
	Splitting getting-started.xhtml (2 KB)
	Splitting on page breaks...
	Looking for large trees...
	No large trees found
	Splitting indexes.xhtml (648 KB)
	Splitting on page breaks...
	Looking for large trees...
Traceback (most recent call last):
  File "/usr/bin/html2epub", line 8, in <module>
    load_entry_point('calibre==0.5.14', 'console_scripts', 'html2epub')()
  File "build/bdist.linux-x86_64/egg/calibre/ebooks/epub/from_html.py", line 543, in main
  File "build/bdist.linux-x86_64/egg/calibre/ebooks/epub/from_html.py", line 480, in convert
  File "build/bdist.linux-x86_64/egg/calibre/ebooks/epub/split.py", line 500, in split
  File "build/bdist.linux-x86_64/egg/calibre/ebooks/epub/split.py", line 76, in __init__
  File "build/bdist.linux-x86_64/egg/calibre/ebooks/epub/split.py", line 166, in split_to_size
  File "build/bdist.linux-x86_64/egg/calibre/ebooks/epub/split.py", line 166, in split_to_size
  File "build/bdist.linux-x86_64/egg/calibre/ebooks/epub/split.py", line 166, in split_to_size
  File "build/bdist.linux-x86_64/egg/calibre/ebooks/epub/split.py", line 152, in split_to_size
calibre.ebooks.epub.split.SplitError: Could not find reasonable point at which to split: indexes.xhtml Sub-tree size: 647 KB
Hmm. Looks like I might have to live without the PHP manual after all
I'm going to have a go with 0.6 as soon as the binary finishes downloading (assuming I can make a couple of i686 binaries work properly on x86_64).


EDIT: Nope, it's not playing ball.
Code:
philpem@cheetah:~/phpdoc/html$ ~/calibre/prebuild/ebook-convert index.html ../phpdoc.lrf
Traceback (most recent call last):
  File "/tmp/init.py", line 45, in <module>
  File "/home/kovid/work/calibre/src/calibre/ebooks/conversion/cli.py", line 214, in main
  File "/home/kovid/work/calibre/src/calibre/ebooks/conversion/cli.py", line 203, in create_option_parser
  File "/home/kovid/work/calibre/src/calibre/ebooks/conversion/plumber.py", line 9, in <module>
  File "/home/kovid/work/calibre/src/calibre/customize/ui.py", line 11, in <module>
  File "/home/kovid/work/calibre/src/calibre/customize/builtins.py", line 318, in <module>
  File "/home/kovid/work/calibre/src/calibre/ebooks/epub/input.py", line 9, in <module>
  File "ExtensionLoader_lxml_etree.py", line 12, in <module>
ImportError: /home/philpem/calibre/prebuild/libexslt.so.0: symbol gcry_cipher_setkey, version GCRYPT_1.2 not defined in file libgcrypt.so.11 with link time reference
EDIT2: Can't get the compiled-from-source version going either. It's after a module called "multiprocessing", which is in Python2.6 and above... I don't think I'm going to bother doing an upgrade to the latest release of Ubuntu just to get Calibre running, thanks.

Last edited by philpem; 07-17-2009 at 06:55 PM.
philpem is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
PRS-505 Reading .DOC file on PRS 505 someoneinseattle Sony Reader 4 07-05-2010 01:05 PM
Need help converting file which is too long to be HTML ficbot Workshop 8 04-06-2010 11:45 PM
Small HTML file won't finish converting AlexBell Calibre 2 07-06-2009 06:15 AM
converting multi-page HTML to Mobipocket shinew Calibre 13 02-21-2009 01:33 PM
converting lit html output into one big file for BD Dave Berk Sony Reader 15 03-29-2007 10:02 PM


All times are GMT -4. The time now is 02:34 PM.


MobileRead.com is a privately owned, operated and funded community.