Shiny New E-Book Gizmo: The Amazon Kindle


View Full Version : Html to LRF for Mac users?


Xenophon
04-03-2007, 02:39 PM
I tried using Scotty's java-based converter, with decent results. But... What I'm really trying to do is to produce usable documents for issues of Jim Baen's Universe (interested readers should check out http://www.baensuniverse.com ).

They make the files available for download in a variety of formats, including .rtf and .html, but I haven't found a way to produce a truly readable version for the PRS500. The .rtf works fine, but you lose the pictures, and don't have any access to a ToC. And with ~1000-page files that include a dozen or more stories and columns, you really really need some built-in navigation.

So I moved on to trying to convert .html for the PRS500. One of the download options is an html file that includes all the text in one big file (with internal navigation), and that has the artwork in a sub-directory. Scotty's converter does a good job with the text, but loses the navigation links and the artwork.

Now what do I do?

Xenophon
:blink:

kovidgoyal
04-03-2007, 03:26 PM
You could try writing a converter yourself. You can use Falstaff's LRF python library to generate the lrf files. Python has plenty of modules that can serve as a base for an HTML parser. If you ignore CSS, it shouldn't be too hard. It's something I keep meaning to get around to doing, but I lack motivation as I don't read any books that require internal navigation on my reader.

kovidgoyal
04-20-2007, 12:30 PM
I've implemented a pure python html->lrf converter that should be able to handle this its in the libprs500 svn accessed via the command html2lrf. It needs testing so let me know if it chokes and I'll fix it.

Xenophon
04-22-2007, 09:48 PM
The conversion ran successfully on one of the html files I'm interested in. When I tried adding it to my library (in prs500-gui) I got the following error:

There was an error calling add_books
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/libprs500-0.3.13-py2.5.egg/libprs500/gui/main.py", line 56, in function
return func(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/libprs500-0.3.13-py2.5.egg/libprs500/gui/main.py", line 216, in add_books
self.library_view.model().add_book(_file)
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/libprs500-0.3.13-py2.5.egg/libprs500/gui/widgets.py", line 654, in add_book
_id = self.db.add_book(path)
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/libprs500-0.3.13-py2.5.egg/libprs500/gui/database.py", line 64, in add_book
mi = get_metadata(open(_file, "r+b"), ext)
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/libprs500-0.3.13-py2.5.egg/libprs500/metadata/meta.py", line 24, in get_metadata
return lrf_metadata(stream)
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/libprs500-0.3.13-py2.5.egg/libprs500/lrf/meta.py", line 196, in get_metadata
mi = MetaInformation(lrf.title.strip(), lrf.author.strip())
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/site-packages/libprs500-0.3.13-py2.5.egg/libprs500/lrf/meta.py", line 105, in __get__
document = dom.parseString(obj.info)
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/xml/dom/minidom.py", line 1923, in parseString
return expatbuilder.parseString(string)
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/xml/dom/expatbuilder.py", line 940, in parseString
return builder.parseString(string)
File "/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/xml/dom/expatbuilder.py", line 223, in parseString
parser.Parse(string, True)
ExpatError: unclosed token: line 19, column 7


Do you need any more information about it?

Xenophon

kovidgoyal
04-23-2007, 11:03 AM
Looks like a problem with the metadata block. Does it work in the connect software LRF viewer? What does

lrf-meta --get-thumbnail your_file.lrf


give you? You can temporarily copy the book to the reader by

prs500 cp your_file.lrf prs500:/Data/media/books


Also can you send me the lrf file so I can debug?

Xenophon
04-23-2007, 12:25 PM
The document appears to work fine on the reader, in the sense that it does not cause crashes, and I seem to be able to read the text. I have no idea how it appears in the Connect Software, as I have no Windows box to run it on.

I'll be happy to send you the lrf file for debugging. Expect an email sometime this evening. Would it help to get the original html as well?

Xenophon

kovidgoyal
04-23-2007, 01:17 PM
Yeah the HTML file would be helpful. Thanks.

kovidgoyal
04-23-2007, 07:47 PM
The LRF file worked for me. Adding it to the library didn't cause any problems. I think this has to do with differences in the way garbage collection of open file objects works in OSX. I've implemented a possible fix in svn. To check could you run the lrf-meta command on your LRF file. That should work even with the code you have.