|
|
View Full Version : html2lrf released - HTML to LRF converter
kovidgoyal 04-26-2007, 11:20 AM I'm happy to announce html2lrf, an open source, cross-platform HTML to LRF converter that I believe is the most feature-complete converter available. It's distributed as part of libprs500 and will eventually be integrated into its GUI. I've attached an LRF file to show off some of its capabilities.
It has support for CSS, recursive links, inline images, lists, tables, dropcaps and embedded fonts.
If you have an HTML file it chokes on, I want to know about it!
For all you content creators, this presents an alternative content creation path. Just open the source file in you favorite office suite, save it as HTML and run it through html2lrf.
Get it and the rest of libprs500 from https://libprs500.kovidgoyal.net. After installation a command html2lrf will be available. It's self documenting, i.e. just run it to get a list of options. Basic usage is as simple as
html2lrf myfile.html
I should thank Falstaff for pylrs and esperanc for the initial CSS conversion code.
ashkulz 04-26-2007, 11:29 AM Congratulations... so you finally released it!
I just looked at the code, and it's written very nicely.
kovidgoyal 04-26-2007, 11:39 AM Thanks now I have to build the GUI and integrate PDFRead and I'll release libprs500 0.4. So much to do, so little time ;-)
ashkulz 04-26-2007, 11:44 AM Now that PDFRead is almost feature-complete (can't think of anything more to add) :) I plan to look into adding more device backends to libprs500 like we discussed.
BTW, I think you should do a strategic renaming of libprs500 ... now it does so much more.
kovidgoyal 04-26-2007, 11:48 AM Now that PDFRead is almost feature-complete (can't think of anything more to add) :) I plan to look into adding more device backends to libprs500 like we discussed.
BTW, I think you should do a strategic renaming of libprs500 ... now it does so much more.
Cool adding more devices is a most welcome feature.
Yeah, but I haven't been able to think of a good name. Any suggestions. Also I think I'll wait until it's a little closer to maturity.
ashkulz 04-26-2007, 11:53 AM BTW, you might want to change your Feisty installation instructions: there's no libusb package, but libusb-0.1-4.
AndyQ 04-26-2007, 11:54 AM Nice work! My first converted book looked very nice with one exception.
The first page was blank except for the following line:
xml version='1.0' encoding='%SOUP-ENCODING%'
Any ideas why this might be?
BTW I'm running the Windows version of teh html2ltf app.
kovidgoyal 04-26-2007, 12:02 PM BTW, you might want to change your Feisty installation instructions: there's no libusb package, but libusb-0.1-4.
Thanks, done.
kovidgoyal 04-26-2007, 12:03 PM Nice work! My first converted book looked very nice with one exception.
The first page was blank except for the following line:
xml version='1.0' encoding='%SOUP-ENCODING%'
Any ideas why this might be?
BTW I'm running the Windows version of teh html2ltf app.
Send me the HTML file that looks like an xml declaration.
AndyQ 04-26-2007, 12:14 PM I've just found out why.
The HTML file for some reason had the following xml declaration at the top:
<?xml version="1.0" encoding="UTF-8" ?>
Removing this causes the html file to be converted fine (without the funny front page).
Whether you want to cater for this oddity I'm not sure (its not really common although all browsers handle it OK)
ashkulz 04-26-2007, 12:23 PM Thanks, done.
Two more packages are also needed: python-qt4-gl python-qt4-sql.
BTW, was looking at the code and looks like it may take some time for me to implement it: The Device interface is not used anywhere, but the PRS500 class is. However, the Device interface is very do-able for the REB1100. For the 1150/1200, you can't really do anything: the whole interface is controlled by the ebook itself, with request coming to a webserver. It *might* be possible to host this in prs500, but I don't think you'd be very interested in it...
kovidgoyal 04-26-2007, 12:36 PM I've just found out why.
The HTML file for some reason had the following xml declaration at the top:
<?xml version="1.0" encoding="UTF-8" ?>
Removing this causes the html file to be converted fine (without the funny front page).
Whether you want to cater for this oddity I'm not sure (its not really common although all browsers handle it OK)
Fixed in svn
kovidgoyal 04-26-2007, 12:40 PM Two more packages are also needed: python-qt4-gl python-qt4-sql.
BTW, was looking at the code and looks like it may take some time for me to implement it: The Device interface is not used anywhere, but the PRS500 class is. However, the Device interface is very do-able for the REB1100. For the 1150/1200, you can't really do anything: the whole interface is controlled by the ebook itself, with request coming to a webserver. It *might* be possible to host this in prs500, but I don't think you'd be very interested in it...
Thanks again. Basically if you implement the Device interface I'll take care of the refactoring of the code to only depend on it. There are some additional constants you may have to define that I'll let you know about later. As for the 1150/1200 its up to you, if you're willing to make it work, I'm happy to help as much as I can.
Gameboy70 04-26-2007, 11:19 PM This sounds fantastic, but pardon my ignorance. I've downloaded and installed libprs500 but don't see any html2lrf anywhere in the program, not did I see it as a component during the installation. TIA.
igorsk 04-27-2007, 06:31 AM It's a commandline program which resides in the libprs500 folder.
jetspeed 04-27-2007, 06:47 AM I have the same problem as Gameboy.
I can see the application within the folder, but I'm not sure how to run an html doc through it.
Sorry for the ignorance... :rolleyes5
igorsk 04-27-2007, 08:13 AM I can't process Baen books. If I convert a single file, it goes fine, but with two of more I get this exception:
Processing 0671578499___0.htm
Parsing HTML... done
Converting to BBeB... done
Processing 0671578499___1.htm
Parsing HTML... done
Converting to BBeB... done
Traceback (most recent call last):
File "convert_from.py", line 848, in <module>
File "convert_from.py", line 783, in main
File "convert_from.py", line 757, in process_file
File "convert_from.py", line 707, in writeto
File "libprs500\lrf\pylrs\pylrs.pyc", line 489, in renderLrf
File "libprs500\lrf\pylrs\pylrs.pyc", line 249, in toLrf
File "libprs500\lrf\pylrs\pylrs.pyc", line 245, in toLrfDelegates
File "libprs500\lrf\pylrs\pylrs.pyc", line 1911, in toLrf
File "libprs500\lrf\pylrs\pylrs.pyc", line 1932, in toLrf
AttributeError: 'NoneType' object has no attribute 'objId'
Above the result of processing 1632 (http://www.webscription.net/SendZip.aspx?SKU=0671578499&ProductID=379&format=H), but it seems to happen with any Baen title.
Thiana 04-27-2007, 10:13 AM I can't process Baen books.
On the subject of Baen books. Is there any chance of adding support for the .opf format? (http://www.idpf.org/oebps/oebps1.0.1/download/oeb101-xhtml.htm) It would be nice to have the metadata and covers automatically imported.
Also, will you be adding support for managing collections on the Reader?
Thanks...
kovidgoyal 04-27-2007, 11:27 AM To run the program open a DOS command prompt and type html2lrf and press enter.
@igorsk
Hmm looks like a problem with the logic in processing <a name=> elements. I'm travelling right now but I'll fix it on Monday.
RWood 04-27-2007, 08:09 PM Support for external cover pages would be nice but they can be included with inline code as well.
While you are able to make a Table of Contents, the routine does not populate the Reader's Table of Contents area, it just leaves it with zero entries. Is this a design decision or one that will be cleaned up in future revisions?
kovidgoyal 04-27-2007, 10:11 PM opf and collections are both features I don't need personally and hence have limited motivation for. But who knows...if I'm ever at a sufficiently loose end...
Not having anything in the table of contents was a design decision. the reason being that its is impossible to know which links in a HTML are for the toc and which not.
By passing the --cover option to html2lrf the first page becomes a full size cover and a thumbnail is automatically generated. Is there some other behavior you are talking about?
Fauve 04-28-2007, 01:06 PM Can html2lrf do batch conversions?
kovidgoyal 04-28-2007, 04:25 PM No though cooking up a script to do batch conversions is trivial. After I've ironed out the bugs I'll write one.
jimmyzou 04-28-2007, 06:12 PM Any user guide? :)
kovidgoyal 04-28-2007, 06:45 PM the command is self documenting. just type html2lrf and pres enter
kovidgoyal 04-30-2007, 02:58 PM Released v0.3.16 which fixes igorsk's bug as well as https://libprs500.kovidgoyal.net/ticket/34
A general note on converting Baen books. Its usually a good idea to pass the file with "toc" in its name to html2lrf.
igorsk 04-30-2007, 03:12 PM Thanks!
Xenophon 05-01-2007, 08:20 AM Released v0.3.16 which fixes igorsk's bug as well as https://libprs500.kovidgoyal.net/ticket/34
A general note on converting Baen books. Its usually a good idea to pass the file with "toc" in its name to html2lrf.
Kovid:
The new version of html2lrf no longer crashes on my Grantville Gazettes (from Baen, of course). However, I now get .lrf files that cause my reader to crash and re-start itself. You should be able to reproduce this with the example html I sent you before, and the command line:
html2lrf --title="Grantville Gazette Vol. 2" --author="Eric Flint, ed." 1011250005_toc.htm -o GrantvilleGazetteV2.lrf --cover=1011250005.jpg
...appropriately quoted to make your shell-of-choice happy.
I'll try varying the command line switches to see if there's and alternative that works, and update the results here.
Xenophon
kovidgoyal 05-01-2007, 11:33 AM Hmm the generated LRF file work fine in the connect software but not on the reader. Annoying...I'll look into it.
Xenophon 05-01-2007, 12:36 PM Thanks, Kovid! This is helping to make libprs500 a super-useful tool.
Xenophon
kovidgoyal 05-01-2007, 03:06 PM Released 0.3.18 which fixes Xenophon's bug, adds support for <blockquote> elements (See text section in updated demo file) and allows control of link recursion.
Xenophon 05-02-2007, 11:28 AM Thanks, Kovid. No more crashes; the converter seems to work fine. I did notice one bit of odd behaviour, however. The resulting lrf files have distinctly different behaviour when it comes to page turning. The lrf file made from the example html source I sent you takes about 10 seconds per page turn on the Reader. Others take the usual one second. But ALL of them take 10+ seconds to page forward onto the page that contains all the links from the ToC!
Any idea what's going on with this? I'd be very interested in an upgrade to html2lrf that makes the Reader happier with these files. Would it help if I also sent you the source for one of the ones that didn't slow down?
Xenophon
kovidgoyal 05-02-2007, 12:02 PM Interestingly, I found that with the grantville gazette, a reset of the reader made the page turns all 1s again. The TOC page turns seem to be simply because the reader can't handle the large number of links. Do you have an LRF file with a large number of links in the TOC that has a quick page turn to the TOC?
One thing that was different about the grantville gazette was the large number of empty <a name> elements throughout the text. These get mapped to empty textblocks by html2lrf. Perhaps the resulting large number of textblocks is what the reader finds hard to handle. I'll look into that.
kovidgoyal 05-02-2007, 02:37 PM I've added a patch to svn that fixes the slowdowns for me. Can you test it and let me know? Thanks. There was also a fix that will cause the TOC elements to link to the page before the actual chapter...this is the correct behavior as per the grantville HTML file. There was a bug in previous verions that caused the TOC links to point to the first page of the chapter.
Xenophon 05-02-2007, 04:13 PM I've added a patch to svn that fixes the slowdowns for me. Can you test it and let me know? Thanks. There was also a fix that will cause the TOC elements to link to the page before the actual chapter...this is the correct behavior as per the grantville HTML file. There was a bug in previous verions that caused the TOC links to point to the first page of the chapter.
I'll give it a try. I should note that I actually thought that having the TOC links point to the first page of the chapter was a good thing rather than a bug. I think that the issue is that the next, prev, contents links that show up as their own page on the Reader are just the top line of the first page of the chapter in a web browser.
I'll let you know if the version in svn fixes the slowdown.
kovidgoyal 05-02-2007, 04:34 PM I'll give it a try. I should note that I actually thought that having the TOC links point to the first page of the chapter was a good thing rather than a bug. I think that the issue is that the next, prev, contents links that show up as their own page on the Reader are just the top line of the first page of the chapter in a web browser.
It is a good thing for this particular HTML file. However that is because of a poor design choice in the HTML file. It specifies a page-break-before for all H1 elements, thus html2lrf puts in a page break after the first line. Since browsers are not affected by the page-break CSS properties whoever made this file, probably didn't realize the mistake.
kovidgoyal 05-02-2007, 04:37 PM Released v 0.3.19 with support for creating the external TOC (I've reversed my earlier decision not to do this upon further consideration). This version also contains completely reworked code for handling <a name> elements. This code hasn't been as thoroughly tested as the original, so there may be bugs.
Xenophon 05-02-2007, 06:38 PM The svn version does, in fact, fix the slowdown. I'll report the page-break issue to Arnold Bailey (Baen's web guy), and perhaps get a fix from that end, instead. Thanks for the quick turn-around on the fixes.
Xenophon
kovidgoyal 05-02-2007, 10:28 PM Glad to hear it...I read a lot of ebooks converted from HTML so getting this tool shipshape is a high priority for me.
kovidgoyal 05-03-2007, 04:24 PM Released 0.3.20 with bug fixes too numerous to elaborate. An upgrade is highly recommended.
Xenophon 05-05-2007, 09:45 AM For those of you converting Baen html files with html2lrf, here's a little trick that may help.
First, process the original html to remove the anchors for each paragraph that are used to show paragraph numbers on mouse over. Then, remove the "page-break-before:" from the css declaration in each htm file. Then, remove the "onmouseover" entry from each paragraph element in the htm files. Only after all of that should you convert it to an lrf.
The result of these changes is a clean lrf file, with no slow-downs in page changes. In addition, when you open it for the first time on the reader, it only takes about 10 seconds to process (as compared to the minute-or-so needed when you convert the original html). I've created a little sed script that automates all of this. Its contents are:
s/<a id=\"p[0-9]*\" name=\"p[0-9]*\"><\/a>//
s/ onmouseover=\"PNo([0-9]*)\"//
s/page-break-before://
You invoke it with the command-line:
sed -i old -f yoursedfilehere *.htm
from the directory where you've unpacked the sources.
Those of you on non-Unix hosts are on your own for how to do this, but perhaps this'll give you a good start.
Xenophon
kovidgoyal 05-05-2007, 10:58 AM Cool I'll add this to html2lrf as a commandline option so people can use it in windows too. Could you open a bug and attach the sed script so I dont forget.
Thanks.
EDIT: I doubt removing the onmouseover is neccessary since it's essentially ignored by html2lrf
Dave Berk 05-05-2007, 04:01 PM When I try to run the program in Ubuntu Feisty i get:
dave@aurora:~$ prs500 info
Traceback (most recent call last):
File "/usr/bin/prs500", line 8, in <module>
load_entry_point('libprs500==0.3.20', 'console_scripts', 'prs500')()
File "/usr/lib/python2.5/site-packages/libprs500-0.3.20-py2.5.egg/libprs500/cli/main.py", line 235, in main
File "/usr/lib/python2.5/site-packages/libprs500-0.3.20-py2.5.egg/libprs500/cli/main.py", line 100, in info
File "/usr/lib/python2.5/site-packages/libprs500-0.3.20-py2.5.egg/libprs500/prs500.py", line 164, in run_session
File "/usr/lib/python2.5/site-packages/libprs500-0.3.20-py2.5.egg/libprs500/prs500.py", line 260, in open
libprs500.errors.DeviceBusy: Device is in use by another application:
Underlying error:Failed to set device configuration to: 2. Error code: -1
But the only applications I have open are a terminal (to run prs500) and opera browser..
Any help?
allovertheglobe 05-06-2007, 03:55 PM When I try to run the program in Ubuntu Feisty i get:
dave@aurora:~$ prs500 info
Traceback (most recent call last):
<snip>
libprs500.errors.DeviceBusy: Device is in use by another application:
Underlying error:Failed to set device configuration to: 2. Error code: -1
But the only applications I have open are a terminal (to run prs500) and opera browser..
Any help?
Had the same problem, most likely a permissions issue, try:
sudo prs500 info
(Using SUDO will require you to enter your password once for that session)
allovertheglobe 05-06-2007, 04:16 PM First, Thanks for html2lrf!
Second, is there some sort of simple, comprehensive feature list of the supported HTML anywhere? I couldn't find one on the website. I plan on writing a little program that outputs HTML files, and I want to be sure I would be able to upload that result to the PRS500. For instance which version of (X)HTML? CSS support? Anchor support? Image support, what formatting works? etc.
Any pointers would be appreciated to cut down the trial-and-error to a minimum...
Tia,
P.
(BTW, Kovid, my browser keeps having an issue with your site's certificate: "Website certified by an unknown Authority"... Could not verify this certificate for unknown reasons)
Edit: Nevermind, just went through it again and saw that you issued it yourself. No problem. Noticed you're a local of the greater LA area like myself :cool: )
allovertheglobe 05-06-2007, 06:06 PM Ok, I've started to play around some more with prs500 et al. I'm using a recently updated Ubuntu install (Feisty Fawn). I installed the tools as per the instructions on Kovid's page, without any notable problems. (Unlike my attempt to get it to work in Edgy Eft by using a (botched) Python2.5 install)
As in my previous post, I need to SUDO all commands to work properly, but I think there are still some serious Permissions issues since everything seems to default to root ownership, which is frowned upon in normal use from what I understand.
- The GUI finally works, however I can only view or delete files on the device, even non-DRMed files I created, lrf or rtf. I cannot move them to the library which again is owned by root, no changes allowed. I chown it, no difference: I cannot copy files from the device or edit the meta data (on the device only, since I can't get them into the library), the buttons remain greyed out.
- Back to the commandline: I can copy them now, but only ONE at a time, since the prs500 cp command does not seem to accept wildcards (i.e. *.lrf) to backup all the books from the memory card to a local folder, as an example.
Obviously that does not add them to the library(.db file) anyway.
- The files are owned by root, so I need to chown them all to able to do anything with them.
(Personally, I would prefer a simple local folder with the books inside for easier manipulation instead of a closed database, like the way prs500 deals with the files on the actual device anyway, or the Connect software on Windows)
- I have not managed to change the metadata on files on the PRS500 directly, either.
Any pointers would be appreciated, perhaps my limited experience with the Linux shell is part of the problem...
Add: I found out about a rather dangerous stunt, turning myself into root using sudo su, then running prs500-gui, that didn't work either.
Dave Berk 05-07-2007, 04:42 AM Thanks, allovertheglobe. Using sudo works. Though now I have encountered some of the problems you describe. Still, it's better than nothing, and I'm thankfull that I can copy my books back at all.
yargoflick 05-07-2007, 12:06 PM Ok, I've started to play around some more with prs500 et al. I'm using a recently updated Ubuntu install (Feisty Fawn). I installed the tools as per the instructions on Kovid's page, without any notable problems. (Unlike my attempt to get it to work in Edgy Eft by using a (botched) Python2.5 install)
As in my previous post, I need to SUDO all commands to work properly, but I think there are still some serious Permissions issues since everything seems to default to root ownership, which is frowned upon in normal use from what I understand.
...
Add: I found out about a rather dangerous stunt, turning myself into root using sudo su, then running prs500-gui, that didn't work either.
Ok, you should NOT have to run prs500 as root...
did you follow the instructions to set up udev and add your user to the group plugdev?
Put the following code into /etc/udev/rules.d/z90-prs500.rules
BUS=="usb", SYSFS{idProduct}=="029b", SYSFS{idVendor}=="054c", MODE="66
Add yourself to the plugdev group
gpasswd -a username plugdev
you can either reboot completely or just restart udev and relogin
to restart udev:
sudo /etc/init.d/udev restart
then log out completely and log back in again
open a terminal and type 'id', you should have plugdev as a group you belong to.
plug your reader in to the usb port and check that the node belongs to the group plugdev.
On my debian machine:
ls -l /dev/bus/usb/001/
outputs:
crw-rw-r-- 1 root root 189, 0 2007-05-07 13:18 001
crw-rw---- 1 root plugdev 189, 4 2007-05-07 14:30 005
id
outputs:
uid=1001(xxx) gid=1001(xxx) 20(dialout),25(floppy),29(audio),46(plugdev),100(u sers),107(fuse),1001(xxx)
I can now run prs500 and prs500-gui without being root.
The library.db that prs500-gui creates is owned by me.
Hope that helps,
Lee
allovertheglobe 05-07-2007, 01:37 PM Ok, you should NOT have to run prs500 as root...
did you follow the instructions to set up udev and add your user to the group plugdev?
Uhm, my bad. I overlooked the "Now go to the post-installation setup" part -_-;;
Thanks for the instructions though, they worked out fine. In case somebody else tries to follow your preceding instructions, please note that the code for the (z)90-prs500.rules got mangled, here's the full line:
BUS=="usb", SYSFS{idProduct}=="029b", SYSFS{idVendor}=="054c", MODE="660", GROUP="plugdev"
(Out of curiosity: is there a particular reason for adding the "z" in front of 90-prs500.rules? It worked out OK anyway...)
As far as the usage of prs500-gui goes, probably a misunderstanding on my part as well. It's at least for now a one-way street: you add the docs (rtf, lrf) you created or downloaded to the library (from your computer), then you can copy them TO the device, but NOT FROM the device. The option is greyed out, and drag-n-drop results in a "NotImplementedError" in the shell. Too bad, maybe later. As for the library being a database file, it has its pros and cons...
kovidgoyal 05-07-2007, 03:53 PM Hmm sorry I was traveling. Let's see
Transfer from reader to library will be implemented in the future. It just wasn't high on my list of priorities.
Glad you got the permissions issues sorted out.
Regarding html2lrf
It basically supports all HTML 4 tags (there may be a few obscure ones I missed out) and most of the font and page related CSS properties. You should have a look at the demo lrf file in the first post of this topic. The converted HTML file is included.
Let me know if you need more info.
FangornUK 05-08-2007, 05:37 AM kovidgoyal, thanks for html2lrf, it works great!
Just noticed some problems, small inline images tend to get scaled up and many images that aren't very high tend to throw new pages before and after when they aren't needed. I always find this Gutenberg document a great one for testing image formatting http://www.gutenberg.org/files/19499/19499-h.zip
Noticed the "--font-delta" option only tends to work for headers not text paragraphs. Is it possible to add an option to format images to a specified number of colours? - I always find pictures with 16 colours work well on the reader and saves lots of space.
kovidgoyal 05-08-2007, 11:34 PM kovidgoyal, thanks for html2lrf, it works great!
Just noticed some problems, small inline images tend to get scaled up and many images that aren't very high tend to throw new pages before and after when they aren't needed. I always find this Gutenberg document a great one for testing image formatting http://www.gutenberg.org/files/19499/19499-h.zip
Noticed the "--font-delta" option only tends to work for headers not text paragraphs. Is it possible to add an option to format images to a specified number of colours? - I always find pictures with 16 colours work well on the reader and saves lots of space.
The image handling problems are fixed in svn. Can you provide an example HTML file for which font-delta does not work? I dont think adding a image downsample option is worth it. After all what if sony releases a new device with support for more colors, it'd be stupid to have to regenerate all the LRF files.
kovidgoyal 05-09-2007, 12:51 AM Released version 0.3.21 with
1) Improved handling on images should fix fangorn's complaints
2) Handling of text-indent on (p, div, h? tags)
3) Added --baen option for preprocessing of baen HTML files
4) Minor bug fixes
igorsk 05-09-2007, 03:53 AM Looks like you're using PIL now but setup.py doesn't check if it's installed.
igorsk 05-09-2007, 03:59 AM Another little bug: in one of the files I had this line:
<h1 align="center"><a name="Chap_0"></a><b>Part One: </b><b><i>January 1635</i></b><br /></h1><!--Prologue-->
The comment was present in the output file.
kovidgoyal 05-09-2007, 09:52 AM Another little bug: in one of the files I had this line:
<h1 align="center"><a name="Chap_0"></a><b>Part One: </b><b><i>January 1635</i></b><br /></h1><!--Prologue-->
The comment was present in the output file.
Running it on that fragment alone doesn't include the comment for me. Can you send me the whole file?
igorsk 05-09-2007, 03:54 PM It's from 1635 (http://baencd.thefifthimperium.com/13-TheBalticWarCD/TheBalticWarCD/1635-The%20Cannon%20Law/1416509380_toc.htm).
Actually, I didn't explain properly. The comment is not present in the text itself (at least I think so, don't have the file here at the moment), but it's there in the TOC entry.
FangornUK 05-09-2007, 04:42 PM Release 0.3.21, images now look great! I've logged some more bug reports for you.
kovidgoyal 05-09-2007, 04:48 PM It's from 1635 (http://baencd.thefifthimperium.com/13-TheBalticWarCD/TheBalticWarCD/1635-The%20Cannon%20Law/1416509380_toc.htm).
Actually, I didn't explain properly. The comment is not present in the text itself (at least I think so, don't have the file here at the moment), but it's there in the TOC entry.
Fixed in svn
kovidgoyal 05-09-2007, 06:40 PM Released 0.3.22 with various minor bugs squashed. Also one major bug that could result in corrupted LRF files.
Xenophon 05-09-2007, 08:10 PM So when I try to update to 0.3.22, I get the following:
sudo easy_install -U libprs500
Searching for libprs500
Reading http://cheeseshop.python.org/pypi/libprs500/
Reading http://libprs500.kovidgoyal.net
Reading http://cheeseshop.python.org/pypi/libprs500/0.3.22
Best match: libprs500 0.3.21
Processing libprs500-0.3.21-py2.5.egg
...
What gives? It seems to know that there is a 0.3.22, but insists that 0.3.21 is the best match? Huh?
kovidgoyal 05-09-2007, 08:22 PM oops should be fine now
kovidgoyal 05-10-2007, 01:22 PM The versions just keep rolling on
0.3.23
1) Has automatic chapter detection (to force page-breaks before chapters).
This can be controlled via two new command line options.
2) Fixed adding of redundant TextStyle and BlockStyle elements to LRF. This should make the generated files smaller.
I just want to add a quick line, the batch command to convert quickly a whole directory of HTML files in Windows is the following one :
for %f in (*.html *.htm) do html2lrf %f
I have forgottent how to do proper batch scripts, and I needed to google it, so I suppose this could help someone else...
kovidgoyal 05-11-2007, 10:33 AM Thanks I might add a command line option to do batch processing. What Would you prefer recursive processing (i.e. directory and all subdirs) or only the top level directory?
kovidgoyal 05-11-2007, 11:36 AM Released 0.3.25
1) Fixed bug introduced in 0.3.24 that caused CSS formatting not to be picked up from <style> tags. Sorry about that.
2) Added support for the reading attribute to lrf-meta
I'd like to take this opportunity to thank FangornUK, Thiana, Xenophon and igorsk for the great job they've been doing as beta testers.
Also I'd like to thank all those of you that have donated to support libprs500. As the proverbial penniless grad student, your generosity is much appreciated.
magogo 05-12-2007, 11:18 AM Hi, kovidgoyal. I tried your html2lrf to convert offline contents into a single lrf. I would say it's great.
I have a suggestion, would you add an option to remove unresolved links which will leave something like "Link: " in the document? I also find that "Link: " replaces the text of the link which is not followed.
Thiana 05-12-2007, 04:23 PM I have a suggestion, would you add an option to remove unresolved links which will leave something like "Link: " in the document? I also find that "Link: " replaces the text of the link which is not followed.
Or maybe copy the iSilo method of dealing with unresolved links? Have the link text point to a page at the end of the lrf which has the list of unresolved links for the book.
kovidgoyal 05-12-2007, 04:42 PM Released 0.3.26
1) Rationalized CLI
2) Improved img handling
3) Automatic page breaks on user specified elements if the HTML file doesn't have page-break information
4) Show link text rather than href by default for broken links
There were a lot of changes in this release so there may be new bugs. Look over your LRF files.
kovidgoyal 05-12-2007, 11:18 PM 0.3.27 with a treat for linux users
The CLI now has BASH completion!
The installation script assumes the completion file should go into /etc/bash_completion.d, if that's not the correct location for your distro you should make a symlink.
FangornUK 05-13-2007, 05:02 AM Superb! html2lrf is now the best "HTML to LRF" converter out there, surpassing Book Designer. Thanks for fixing the image centring. The CSS handling means Gutenberg books convert wonderfully.
allovertheglobe 05-13-2007, 02:30 PM Regarding libprs500-gui:
Would be it possible to have it remember the column width when switching from one view to the next (i.e. library > reader > card)? On my system anyway (Ubuntu Feisty) it always resets the width, cutting off most of the book titles as I go back and forth...
Or even remember those settings and perhaps even the window size when re-opening?
kovidgoyal 05-13-2007, 03:02 PM Open a bug report and I'll see that it makes it into 0.4.0
https://libprs500.kovidgoyal.net/newticket
AndyQ 05-14-2007, 02:49 AM One thing, would you expect that libprs500 would be able to see the reader on Windows (with the latest version of connect & reader firmware)?
If I try the command line version I just get a Reader not found message.
Thiana 05-14-2007, 03:25 AM One thing, would you expect that libprs500 would be able to see the reader on Windows (with the latest version of connect & reader firmware)?
CONNECT and libprs use two different USB drivers. Just uninstall the connect one, then install the one libprs provides. You don't even need to reboot.
Thiana
AndyQ 05-14-2007, 11:57 AM Excellent thanks that works.
Is there any way to edit the meta information of files on the reader or do I have to do that on the PC and then transfer them back to the reader?
kovidgoyal 05-14-2007, 12:19 PM Nope no metadata editing on the reader.
kovidgoyal 05-14-2007, 02:40 PM 0.3.30 with bug fixes and automatic import of metadata from opf files.
igorsk 05-14-2007, 04:54 PM Wow, nice!
kovidgoyal 05-14-2007, 05:00 PM Wow, nice!
Thanks...at the moment its only title, author and author sort key, but it will be easy to add more in the future.
Elltrain 05-14-2007, 09:16 PM Hey, just discovered this tool and it works GREAT for 3 really nasty Toc-style html files.
A bug report, and feature suggestion:
The ’ character in the text I have is not formatted right, it comes up as a blank every time in the .lrf file.
Second, a line-cropping option would be great. So I could specify -crop 2 2 to crop 2 lines off the top and bottom of each .html page. The files I have contain lines at the top with website stuff there that I would like stripped out.
kovidgoyal 05-14-2007, 09:28 PM When you say crop lines off the top do you mean before the opening html tag i.e. the very first lines of each file?
Open a bug report and attach a sample file with the suspect character and I'll fix it.
https://libprs500.kovidgoyal.net/newticket
Elltrain 05-14-2007, 10:45 PM When you say crop lines off the top do you mean before the opening html tag i.e. the very first lines of each file?
Open a bug report and attach a sample file with the suspect character and I'll fix it.
https://libprs500.kovidgoyal.net/newticket
Reported.
And you could do it either way, really. Basically, every file in the batch has a line at the top and bottom with links "Previous | Table of Contents | Next" which direct to the appropriate pages. In the lrf conversion, these get stuck all over the place and it's basically unreadable.
Elltrain 05-14-2007, 10:48 PM Ticket opened. The forum seemed to eat my other reply.
You could do the cropping either way, really. Basically, every file in the batch has a line at the top and bottom with links "Previous | Table of Contents | Next" which direct to the appropriate pages. In the lrf conversion, these get stuck all over the place and it's basically unreadable.
I uploaded one of the chapters so you can see the formatting.
blueowl 05-15-2007, 02:14 AM Hello kovidgoyal,
first I'd like to say thanks a lot for libprs500. It's very good to have such a tool (mainly if you use just linux).
But to my point, I would like to ask whether html2lrf is able to embed fonts into lrf. This is quite important for me as I am using non-english fonts. So far I have solved it via pdf, however lrf could be better as it's Reader's native format.
Jirka
kovidgoyal 05-15-2007, 08:28 AM No it i'm afraid it doesn't have support for embedded fonts. I might add support in the future, but that s not a priority for me. I would suggest you open a ticket and I'll get around to it when I have the time.
Xenophon 05-15-2007, 10:54 AM 0.3.30 with bug fixes and automatic import of metadata from opf files.
Very Cool! :D
I've already entered a new ticket with suggestions for more metadata that should be easy to import (at least from Baen-style opf files). :huh:
libprs500 is really spiffy, and is fast becoming the best of the open-source tools for talking to the reader and converting files for it!
JSWolf 05-15-2007, 12:50 PM No it i'm afraid it doesn't have support for embedded fonts. I might add support in the future, but that s not a priority for me. I would suggest you open a ticket and I'll get around to it when I have the time.
I just sent a ticket for embedded fonts. I really want this.
Does html2lfr support tables and inline graphics (graphics with text on either side)?
Jon
yargoflick 05-15-2007, 06:02 PM 0.3.30 with bug fixes and automatic import of metadata from opf files.
I just discovered the bash completetion for the command line tools. It is swell and wonder.
My reader and I thank you,
and hereby send you some beer money.
Lee
kovidgoyal 05-15-2007, 11:18 PM @yargoflick
Thank ee kindly
@jswolf
I'll give it a stab sometime
FangornUK 05-16-2007, 08:33 AM kovidgoyal, is there a tip to stop images being resized when changing the font size? I have some images on a page with nothing else and would like to always be the original size.
igorsk 05-16-2007, 09:09 AM Check the sample HTML.
FangornUK 05-16-2007, 09:18 AM I did and the image size of the big one changes. Possible bug introduced then after the recent image fixes? Before I log a bug I just wanted to make sure I wasn't missing anything.
kovidgoyal 05-16-2007, 10:01 AM That behavior has changed in recent versions. It's a trade-off. I could go back to the old behavior, but in that case the large images would not respect centering (because I can't figure out how to get the ImageBlock element to respect centering).
Is there some particular reason you dont want the image to be resized?
FangornUK 05-16-2007, 12:11 PM Book Designer manages to centre it but of course it can't do in-line images as your libprs500 can.
Here's an output from Book Designer's LRS with an image that is 550x724 in size (it is centred):
<ImageBlock x0="0" y0="0" x1="524" y1="690" xsize="524" ysize="690" blockwidth="524" blockheight="690" refstream="im_1" objid="pic_1" topskip="0" sidemargin="0" blockstyle="bs_im">
</ImageBlock>
<BlockStyle stylelabel="bs_im" objid="bs_im" blockwidth="524" blockheight="704" sidemargin="0" blockrule="block-fixed" layout="LrTb" topskip="0" footskip="0" framecolor="0x00000000" framemode="square" framewidth="0" bgimagemode="fix"/>
If you want more information let me know and I'll log a bug. I can of course send you the files you need.
kovidgoyal 05-16-2007, 01:16 PM That's not actually centered, the image is simply scaled to the page size so it looks centered. If you try it with a narrow image, it will be left aligned.
JSWolf 05-16-2007, 03:17 PM Does html2lfr support tables and inline graphics (graphics with text on either side)?
kovidgoyal 05-16-2007, 03:44 PM Does html2lfr support tables and inline graphics (graphics with text on either side)?
inline graphics yes
tables partially
look at the demo file in the first post
JSWolf 05-16-2007, 10:57 PM I know html2lrf does not yet have support for embedded fonts. But, would it be possible for support for the three fonts already present in the Reader? So if I switch to one of those fonts, I'd get that same font on screen. To me this sounds like a fairly easy addition. I could be way off here. But it would be nice to have. It means I can use any of the three fonts as I see fit. Thanks!
Jon
kovidgoyal 05-17-2007, 12:33 AM That support is already there. See the demo file for example.
JSWolf 05-17-2007, 12:54 AM Where can I find the demo file? The first post does not have the attached file that it says it does. And when you do reattach it, can you also attach the source HTML as well please? Thanks!
kovidgoyal 05-17-2007, 01:31 AM Hmm it seems to have disappeared. Re-attached. The html source is included within the files
kovidgoyal 05-17-2007, 09:33 PM Added support for the Canvas element to pylrs. That's enough for rudimentary tables (modulo rowspan/colspan). I should also be able to implement the centering of ImageBlocks...now if only I had more time ;-)
JSWolf 05-17-2007, 10:35 PM I'll ave to try out tables in that case. Kewl!
isaacrdz 05-17-2007, 11:47 PM Is html2lrf a command line app? Or is it part of libprs500? If it's stand alone, can it be posted? The instructions to install the whole libprs500 package on Mac OSX is not clear for me.
kovidgoyal 05-17-2007, 11:56 PM It's part of libprs500. If you only want to use html2lrf all you need is
python 2.5 and setuptools. Once they're installed a simple
easy_install -f http://www.pythonware.com/products/pil Imaging
easy_install-2.5 -U libprs500
will install it automatically.
EDIT: Forgot about the dependency on Imaging
FangornUK 05-18-2007, 04:25 AM Added support for the Canvas element to pylrs. ... I should also be able to implement the centering of ImageBlocks...now if only I had more time ;-)
Woohoo! :)
kovidgoyal 05-18-2007, 01:05 PM Released 0.3.32
1) Large images are automatically placed in centered ImageBlocks
2) Fixed handling of images referred to in <a> tags
3) Autorotation of images whose width > height
4) Minor bug fixes
Note: there was a major module re-organization behind the scenes. This may cause some bugs, so let me know.
EDIT: Re-uploaded with autorotation of images.
isaacrdz 05-18-2007, 10:08 PM It's part of libprs500. If you only want to use html2lrf all you need is
python 2.5 and setuptools. Once they're installed a simple
easy_install -f http://www.pythonware.com/products/pil Imaging
easy_install-2.5 -U libprs500
will install it automatically.
EDIT: Forgot about the dependency on Imaging
For some reason, that made more sense to me than the instructions on libprs500 page. :)
Maybe I'll put together a more "Mac friendly" of the installation instructions later.
kovidgoyal 05-18-2007, 10:16 PM Well that's because the instructions on the website cater to 3 operating systems and all the features of libprs500. If you do write up some instructions for OSX I'll add a link on the wiki.
Paviko 05-20-2007, 03:53 PM Hello kovidgoyal,
Thank you very much for html2lrf converter, it's very usefull.
I've noticed one strange thing with 0.3.32 version. The images, that have widht > height are missplaced (put to the right and cut off). Maybe PRS-500 is not supporting image rotating ?
I attaching simple html file that is causing problems.
Thank you.
kovidgoyal 05-20-2007, 07:49 PM Version 0.3.33
1) Added support for HTML tables
2) Squashed image handling bug that had crawled in for 0.3.32
3) Minor bug fixes
To see examples of table handling look at the demo file attached to the first post of this thread.
As a reminder bug reports go here:
https://libprs500.kovidgoyal.net/newticket
JSWolf 05-20-2007, 07:54 PM Version 0.3.33
1) Added support for HTML tables
2) Squashed image handling bug that had crawled in for 0.3.32
3) Minor bug fixes
To see examples of table handling look at the demo file attached to the first post of this thread.
As a reminder bug reports go here:
https://libprs500.kovidgoyal.net/newticket
Very nice! Would you mind also attaching the HTML code as a seperate file so we can have the code that is used for the demo? Thanks!
kovidgoyal 05-20-2007, 07:57 PM The HTML code is embedded in the LRF file.
JSWolf 05-20-2007, 08:49 PM The HTML code is embedded in the LRF file.
Actually, it's not this time. It ends with the Recursive link following.
allovertheglobe 05-20-2007, 08:53 PM The HTML code is embedded in the LRF file.
I found that not to very useful due to:
1) the formatting of the HTML code being less than ideal on the reader, and without the possibility of syntax highlighting
2) having to try and read it on the reader's grayish screen in a dimmed room (for my backlit LCD screen) (I don't have access to the CONNECT software reader)
So I second JSwolf's request...
allovertheglobe 05-20-2007, 08:58 PM Has anybody some advice regarding which is a better choice for LRF files (by way of html2lrf) in terms of performance (not necessarily filesize) for files with hundreds of links (references):
One very large monolithic file with all inline anchors OR rather many separate pages interlinked with external anchors? Or does it not make much a difference in the context of LRF either way?
kovidgoyal 05-20-2007, 09:05 PM If you're talking about the performance of the LRF file when viewed on the reader, it makes no difference.
Added HTML source.
JSWolf 05-20-2007, 09:17 PM If you're talking about the performance of the LRF file when viewed on the reader, it makes no difference.
Added HTML source.
I'm trying to convert the demo to LRF and it's not working. I did install the new version of libprs500.
[e:\e-books\sonyreader\demo]"c:\Program Files\libprs500\html2lrf.exe" demo
Processing demo.html
Parsing HTML... done
Converting to BBeB...
Traceback (most recent call last):
File "convert_from.py", line 1351, in <module>
File "convert_from.py", line 1286, in main
File "convert_from.py", line 1158, in process_file
File "convert_from.py", line 356, in __init__
File "convert_from.py", line 440, in parse_file
File "convert_from.py", line 649, in process_children
File "convert_from.py", line 1060, in parse_tag
File "convert_from.py", line 649, in process_children
File "convert_from.py", line 1058, in parse_tag
File "convert_from.py", line 1069, in process_table
File "libprs500\ebooks\lrf\html\table.pyc", line 281, in blocks
File "libprs500\ebooks\lrf\html\table.pyc", line 251, in get_widths
File "libprs500\ebooks\lrf\html\table.pyc", line 209, in preferred_width
File "libprs500\ebooks\lrf\html\table.pyc", line 167, in preferred_width
File "libprs500\ebooks\lrf\html\table.pyc", line 164, in text_block_preferred_width
File "libprs500\ebooks\lrf\html\table.pyc", line 128, in text_block_size
File "libprs500\ebooks\lrf\fonts\__init__.pyc", line 38, in get_font
File "pkg_resources.pyc", line 800, in resource_filename
File "pkg_resources.pyc", line 1221, in get_resource_filename
NotImplementedError: resource_filename() only supported for .egg, not .zip
Did I do something wrong?
kovidgoyal 05-20-2007, 09:24 PM Looks like a windows bug i'll uploaded a fixed version later.
JSWolf 05-20-2007, 09:26 PM Looks like a windows bug i'll uploaded a fixed version later.
Thank you! I am very interesting in trying this with some LIT files converted to HTML that have the seperate contents file.
kovidgoyal 05-20-2007, 11:27 PM OK uploaded 0.3.34 that should work on Windows. That's three hours of my life wasted. Could some OSX user try it as well?
Note to self: Remember not to wander anywhere you're likely to meet Billy The Goat in a deserted alleyway.
JSWolf 05-21-2007, 05:28 AM OK uploaded 0.3.34 that should work on Windows. That's three hours of my life wasted. Could some OSX user try it as well?
Note to self: Remember not to wander anywhere you're likely to meet Billy The Goat in a deserted alleyway.
Seems to work now under Windows.
Couple of quick questions....
How do I change the margins so it uses less margin in the LRF?
How can I change the font size? I tried changing the font size in the HTML and it didn't work in the paragraphs that had the font size changed from 12 point to 14 point.
Edit: I think some of these problems are due to the css. I'll have to have a look at it.
kovidgoyal 05-21-2007, 10:22 AM At the moment the margins are hard coded. Open a bug report and I'll add an option to change them. Try the --font-delta options
Xenophon 05-21-2007, 02:02 PM ... Try the --font-delta options
Have you fixed the font-delta option to affect more than just headers? Last I tried it, my body text stayed the same size and only chapter headers got smaller (or larger).
kovidgoyal 05-21-2007, 02:05 PM I have yet to see an example file on which --font-delta fails.
kovidgoyal 05-21-2007, 02:20 PM A quick questing regarding auto rotation of images. DO people generally prefer the images to be rotated clockwise or counter clockwise? IOW, when look at a landscape image on the reader do you prefer to have the page turn buttons above or below the screen?
kovidgoyal 05-21-2007, 08:46 PM Released v0.3.36 with various bugs squashed. Please upgrade to this version before filing any bug reports.
EDIT: Atleast it will be released as soon as my computer exits swapland
FangornUK 05-22-2007, 06:09 AM Wow! HTML2LRF really has come a long way. Just ran it on Gutenberg etext 21511, which has heavy & awkward formatting, and it produces a truly great looking BBeB with zero editing.
Xenophon 05-22-2007, 08:04 AM I have yet to see an example file on which --font-delta fails.
Well... It didn't work a few weeks ago (I sent you that example). I hadn't checked since. And... It's working beautifully now! I must not have noticed when you fixed it.
Hmmm... It didn't work a few weeks ago, but does now... Thank you for great software service. More Karma for Kovid!
kovidgoyal 05-22-2007, 11:05 AM Yeah the only feature missing now is embedded fonts. After that is implemented, I'm declaring html2lrf to be in purely bugfix mode.
JSWolf 05-22-2007, 11:28 AM I do have one question... How can I add headers and footers to an HTML document so they come out properly in the resulting LRF file? I'd like to have headers and footers like I can get with Book Designer. Thanks!
kovidgoyal 05-22-2007, 11:29 AM There is no support for custom headers. But you can have the title and author appear in the top right corner with the header option.
JSWolf 05-22-2007, 11:32 AM Thanks! That will do pretty well. Would it be ok if I put in one more enhancement request for title's as footers?
kovidgoyal 05-22-2007, 11:44 AM You're welcome to add the request but it's probably not going to be honored as I'm switching focus to getting libprs500 ready for 0.4.0.
JSWolf 05-22-2007, 02:00 PM I'm having a problem with html2lrf with page breaks. The following HTML sample only breaks in one place when it should have 4 page breaks if I read the directions correctly. If I am doing something wrong, please tell me. Thanks!
allovertheglobe 05-22-2007, 02:07 PM A quick questing regarding auto rotation of images. DO people generally prefer the images to be rotated clockwise or counter clockwise? IOW, when look at a landscape image on the reader do you prefer to have the page turn buttons above or below the screen?
Since nobody seems to have replied yet: I prefer to control the size/format manually, depending on the image. If they are landscape though:
When you switch the reader into landscape mode, the only rotation for the text afaik is the one with the tiny useless page turn buttons on top, or more importantly, the big round button on the left. So I'd rather have the images rotated the same way, rather than having to turn it upside down for the images only.
kovidgoyal 05-22-2007, 02:26 PM To force page breaks you have to put a style="page-break-before: always" in the tag. Automatic page-breaking only kicks in if the current page already has a lot of contents and is intended to prevent LRF slowdowns.
kovidgoyal 05-22-2007, 02:31 PM Released 0.3.37
1) Changed autorotation to anti-clockwise by (not very) popular demand
2) Various bug fixes
FangornUK 05-23-2007, 09:19 AM kovidgoyal, how can I download the source? Can't seem to see a link for SVN (Subversion).
igorsk 05-23-2007, 09:34 AM https://libprs500.kovidgoyal.net/browser/trunk?order=name
Zip at the bottom.
Though it might be simpler to just use SVN (https://libprs500.kovidgoyal.net/wiki/Development).
FangornUK 05-23-2007, 09:42 AM igorsk, thanks that's what I was after.
kovidgoyal 05-23-2007, 12:48 PM Released 0.3.38
1) Improved algorithm for calculating table cell widths (now respects minimum width)
2) Added support for width=% attributes of <td> tags
Xenophon 05-23-2007, 12:51 PM Kovid:
I noticed an odd thing with the font-delta support. When I shrink the font, all the inter-line spacing remains unchanged. That winds up looking pretty odd. Perhaps when shrinking the font 2x the (negative) number of points you should also reduce the interline spacing by, say, 1x the number of points.
kovidgoyal 05-23-2007, 01:23 PM Released 0.3.39:
1) Adjust linespacing appropriately when font-delta is used.
astra 05-24-2007, 08:03 AM I am sorry but I am going to ask a pretty stupid question.
It is a command line tool, right? There is no gui?
blueowl 05-24-2007, 08:13 AM I am sorry but I am going to ask a pretty stupid question.
It is a command line tool, right? There is no gui?
It does have the GUI for qt (prs500-gui). Look at homepage: https://libprs500.kovidgoyal.net/
astra 05-24-2007, 09:47 AM Oh, I see. I have to execute this command to get the GUI. Just like old win 3.11 :)
Thanks :)
FangornUK 05-24-2007, 11:12 AM The HTML2LRF utility is a command line tool and doesn't have any GUI. People seem to be afraid of Command Line tools unfortunately even though a utility like this is dead easy to use. Don't be put off!
kovidgoyal 05-24-2007, 11:47 AM Actually atleast on windows there should be an icon installed to both the Start menu and the desktop. But the GUI right now doesn't have a component for html2lrf...that will come in version 0.4.0
astra 05-24-2007, 11:54 AM The HTML2LRF utility is a command line tool and doesn't have any GUI. People seem to be afraid of Command Line tools unfortunately even though a utility like this is dead easy to use. Don't be put off!
Easy to explain.
When you have gui, you just walk around, click with a mouse and find what you want. However, with command line you have to read instructions and it takes time and effort :D Plus sometimes explanation of syntax for specific command might be confusing :rolleyes5
astra 05-24-2007, 11:55 AM Actually atleast on windows there should be an icon installed to both the Start menu and the desktop. But the GUI right now doesn't have a component for html2lrf...that will come in version 0.4.0
Thank you. Then I am going to wait for 0.4.0 :)
kovidgoyal 05-24-2007, 12:34 PM Thank you. Then I am going to wait for 0.4.0 :)
That's cool...I'd just like to say that while it's true that GUI's are easier to use (for simple tasks) the little bit of extra effort it takes to learn a CLI usually more than justifies itself over the long term.
In other news released 0.3.40 with more refinements to the table algorithm.
RWood 05-24-2007, 01:35 PM I am working on A Child's Garden of Verses and have switched the formatting from BookDesigner to GoLive for use with html2lrf since the graphics implementation in the former is lacking and the graphics in the book are a major part of the enjoyment.
I have also started to use the libprs500 as my main index program for the hard disk collection. It has a lot of features not found in other systems.
kovidgoyal 05-24-2007, 01:40 PM That's good to hear. Let me know when you have the conversion done, I'd be interested in seeing the results.
kovidgoyal 05-26-2007, 11:13 AM v0.3.42 with bug fixes.
shawn 05-27-2007, 11:34 AM kovidgoyal, I use AVG antivirus and it detected prs500-gui.exe as a trojan horse virus. I'm 100% sure its a false positive but just wanted to let you guys know.
PS. the newest avg update fixes this false positive
kovidgoyal 05-27-2007, 11:45 AM Wow that's annoying. Can someone with some other AV virus software scan prs500-gui.exe and see what it has to say? I don't use AV myself.
EDIT: Ah glad to hear the latest update fixes it.
Fauve 05-29-2007, 08:13 AM I just want to add a quick line, the batch command to convert quickly a whole directory of HTML files in Windows is the following one :
for %f in (*.html *.htm) do html2lrf %f
I have forgottent how to do proper batch scripts, and I needed to google it, so I suppose this could help someone else...
It did! :happy2: Thanks! And a huge thank-you to you, Kovid, for making this great tool!
One thing I did notice is that when I batch-convert to LRF, I don't have authors on the converted books. Where does html2lrf look in the html file for author information?
kovidgoyal 05-29-2007, 11:55 AM It doesn't you have to specify author information via the commandline option -a
I post another quick message. I just realised my previous solution for batch conversion doesn't work with long filenames under windows xp. So here's the corrected version :
for %f in (*.html *.htm) do html2lrf "%f"
(It's all in those tricky => ""...
JSWolf 05-30-2007, 03:41 PM I did use html2lrf for formatting Doctor Dolittle's Garden because of all the graphics and it was so much easier to deal with then Book Designer.
kovidgoyal 05-30-2007, 03:46 PM Cool thanks.
kovidgoyal 06-06-2007, 01:10 PM Released v0.3.46:
1) Support for setting page margins
2) Bug fixes
dsyzling 06-07-2007, 03:22 AM I raised an issue in ticket #99 trying to convert some html saved as filtered from Office 2007. I've seen this a few times so thought I'd upload the file and raise a case.
I'll also try the new release to see if this resolves the issue and the margin support is welcomed.
Regards
Darren
dsyzling 06-07-2007, 09:09 AM Tried 0.3.46 and the error raised in ticket #99 no longer appears. However the code samples now appear with span tag markup within the resulting lrf document.
Regards
Darren
kovidgoyal 06-07-2007, 09:44 AM That's because they're inside <pre> elements. This is the correct behavior.
JSWolf 06-07-2007, 09:03 PM I hope the ticket I put in for changing the way --font-delta=FONT_DELTA works is not going to be difficult to change. I would like to have it adjust by 1pt and not 2pts.
Jon
JSWolf 06-07-2007, 10:10 PM I did just find what I concider to be a bug in html2lrf. It does not respond properly to a <span style=font-size:'13pt'> It outputs the font in the same size as if I had done a <span style=font-size:'small'> small is 12pt and 13pt is what I was trying to set. I would like more control over the font size by using the point size in the font-size style.
I'll be putting this up as a ticket as I feel it is a bug.
JSWolf 06-08-2007, 12:49 PM I notice the two tickets I've put in have been fixed. How can I get the fixes?
I've looked at how to update from SVN and I don't have the files needed to pull from there...
python setup.py develop --uninstall
svn update
python setup.py develop (as root)
No setup.py and no svn
I'm on Windows and use the Windows installer.
kovidgoyal 06-08-2007, 01:08 PM Running libprs500 from svn on windows is not easy. Just wait a little and I'll release the next version soon. Hopefully today, probably tomorrow.
JSWolf 06-08-2007, 01:09 PM Running libprs500 from svn on windows is not easy. Just wait a little and I'll release the next version soon. Hopefully today, probably tomorrow.
Thanks! I'm looking forward to it.
JSWolf 06-09-2007, 09:42 PM I hate to be a pest, but any idea when the new Windows version of html2lrf will be ready? I've got a few books waiting to be converted that need the ability to tweak the font size. I'm getting either too small at small and too big at medium. Thanks!
Jon
kovidgoyal 06-09-2007, 09:57 PM Released 0.3.48
1) Refined font handling to address JSWolf's complaints
2) Bug fixes
There have been a lot of changes under the hood as I move the codebase towards 0.4.0, so some things might have broken. Let me know.
JSWolf 06-10-2007, 08:36 AM Released 0.3.48
1) Refined font handling to address JSWolf's complaints
2) Bug fixes
There have been a lot of changes under the hood as I move the codebase towards 0.4.0, so some things might have broken. Let me know.
--right-margin is broken. Changing the right margin does not do a thing. All that happens if I lessen the left margin is it moves the text over properly, but makes the right margin larger. if I set just a right margin, nothing happens. I set a left margin of 5 (worked) and a right margin of 30 and got no different then if I had not set a right margin.
Jon
awh_tokyo 06-10-2007, 08:53 PM [dumb question answered in another thread] Oop, can I not delete this?
kovidgoyal 06-12-2007, 07:00 PM Released 0.3.49
1) Bug fixes
2) OSX installer
Xenophon 06-12-2007, 09:06 PM Mac OS Installer (actually an downloadable, self-contained app) -- YAAAAY!
Installer fails on my Mac. :disappoin
Using the command-line installation (via easy_install) -- html2lrf still produces output that lrf-meta doesn't like (as per earlier ticket), but no new bugs. :)
kovidgoyal 06-12-2007, 09:11 PM You can't run it for within the dmg. You have to drag it to a folder of you filesystem. I'll add that to the wiki. I'll address the lrf-meta bug later.
EDIT: Oops there was a silly little typo in the script generating the OSX dmg. Fixed, version 0.3.50 is being uploaded now.
JSWolf 06-17-2007, 09:26 PM I recently installed Libprs500 on our laptop running Windows XP Pro. It did not have the Connect software installed. I used it to delete a book and to add another book. The problem I encountered was that it totally botched up the collections I had setup. Did I do something incorrectly or does it do that if you have collections setup?
kovidgoyal 06-18-2007, 02:48 AM No it shouldn't. Though I've never actually tested it with collections as I don't use them myself.
JSWolf 06-18-2007, 04:48 AM No it shouldn't. Though I've never actually tested it with collections as I don't use them myself.
Any chance you could have a go at fixing that? Alos there were some errors with Libprs500 that are in the log. Later today I'll power up the laptop and pull off the log and port it as a ticket. I'll also post the collection problem as well.
kovidgoyal 06-18-2007, 10:58 AM Open a bug report and I'll get around to it. :)
JSWolf 06-18-2007, 11:07 AM Will do!
FangornUK 06-22-2007, 09:37 AM kovidgoyal, just tried the latest version and with the variable margins the defaults have changed from before and in my eyes they now look awful (looked perfect before). Can you tell me what the defaults for the margins were before so I can add them on the command line?
kovidgoyal 06-22-2007, 10:11 AM Hmm not sure I remember correctly. Try 20, 20 for the left and right margins.
FangornUK 06-22-2007, 10:59 AM Just found an old LRF and compared it, appears to be 25 for left and right margins.
kovidgoyal 06-22-2007, 11:06 AM OK the right margin being set to 5 in the current version is a typo. I'll set both left and right margins to 20 as the default in the next release as I like that.
JSWolf 06-22-2007, 11:11 AM OK the right margin being set to 5 in the current version is a typo. I'll set both left and right margins to 20 as the default in the next release as I like that.
Please don't. I actually like the new margins as they are by default. Making them bigger puts them back to where I didn't like before.
kovidgoyal 06-22-2007, 11:23 AM Yeah but I prefer slightly bigger margins, and since I'm the author I get to decide the defaults ;-) Only the right margin is changing and the new value is smaller than the old one.
JSWolf 06-22-2007, 12:43 PM Why do people pefer to have bigger margins? It just ends up looking awful if you need/want to go to a larger font.
kovidgoyal 06-22-2007, 01:06 PM It does? That's not something I've noticed. But then I'm not the type to pay close attention to those kinds of details anyway.
kovidgoyal 06-28-2007, 08:08 PM Released v0.3.58 with support for embedding fonts. See the attached LRF file in the first post for a demo.
To use an embedded font the option is
--serif-family "/path/to/fonts/dir, font name"
For e.g.
--serif-family "C:\Windows\fonts, Times New Roman"
Enjoy.
JSWolf 06-28-2007, 09:31 PM So can I embed the italic font or the bold font if I wanted to?
kovidgoyal 06-28-2007, 09:34 PM Automatically embedded when you specify the family.
JSWolf 06-28-2007, 09:37 PM Automatically embedded when you specify the family.
If I want to use the built in Dutch Roman font but the Times New Roman italics, I just specify Time New Roman for where the italics are? How can I unspecify the embedded font to use the built-in font?
kovidgoyal 06-28-2007, 09:41 PM Just specify --serif-family "C:\Windows\Fonts, Times New Roman" that will automatically give you italics, bold and bold-italics. You cannot specify a separate font for just a single element of text. That is really bad style and html2lrf wont support it.
JSWolf 06-28-2007, 09:45 PM Just specify --serif-family "C:\Windows\Fonts, Times New Roman" that will automatically give you italics, bold and bold-italics. You cannot specify a separate font for just a single element of text. That is really bad style and html2lrf wont support it.
If I specify a font family of "Dutch801 Rm BT Roman" to which I only have on my computer the regular font. Will it use the internal font or will it try to embed the one that's on my computer?
kovidgoyal 06-28-2007, 09:53 PM It will embed and you'll have the usual italic since that family doesn't have bold, italic fonts.
JSWolf 06-28-2007, 09:55 PM It will embed and you'll have the usual italic since that family doesn't have bold, italic fonts.
So once I embed a font, I'm unable to go back to the built-in fonts? All I want to do is use the internal Dutch Roman to red normal text and Times New Roman for bold and italics.
kovidgoyal 06-28-2007, 09:59 PM For that LRF file, yes.
JSWolf 06-28-2007, 10:01 PM For that LRF file, yes.
Can you add in the ability to go back to the built-in fonts if we want?
kovidgoyal 06-28-2007, 10:07 PM No using two fonts for the same style of text is just not good document design. You shouldn't need to have two fonts for serif text in your document. Looks bad.
JSWolf 06-28-2007, 10:13 PM I just converted a LIT file to HTML because lit2LRF is not working. It failes to write out the HTML file. The conversion from HTML to LRF failed...
html2lrf -a "Mike W. Barr" -t "The Centre Cannot Hold" --serif-family "C:\Windows\Fonts, Times New Roman" contents.html
Processing contents.html
Parsing HTML... done
Converting to BBeB... done
Processing mereanarchybook2thecentrecannothold.html
Parsing HTML... done
Converting to BBeB... done
Traceback (most recent call last):
File "convert_from.py", line 1400, in <module>
File "convert_from.py", line 1335, in main
File "convert_from.py", line 1195, in process_file
File "convert_from.py", line 620, in process_links
File "libprs500\ebooks\lrf\pylrs\pylrs.pyo", line 301, in append
libprs500.ebooks.lrf.pylrs.pylrs.LrsError: can't append CharButton to Text
kovidgoyal 06-28-2007, 10:21 PM Send me the lit file.
JSWolf 06-28-2007, 10:59 PM Send me the lit file.
Sent along with an HTML conversion from a LIT file that fails too.
kovidgoyal 06-29-2007, 02:27 AM Both bugs squashed in 0.3.60
JSWolf 06-29-2007, 02:43 AM Both bugs squashed in 0.3.60
I've emailed more bugs to be squashed.
igorsk 06-29-2007, 03:18 AM Could you put up a separate download with only html2lrf and lit2lrf, preferably py2exe'd? I'd like to pass these utilities to some of my friends, but don't want to make them install the full GUI or Python.
John Wheater 06-29-2007, 06:07 AM I'm happy to announce html2lrf, an open source, cross-platform HTML to LRF converter that I believe is the most
<snip>
If you have an HTML file it chokes on, I want to know about it!
Dear kovidgoyal
Many thanks for your kind reply to my request for html2rlf.
I got the problem below after downloading and saving the html from http://www.gutenberg.org/etext/19694 into your libprs500 directory.
I was expecting to see SmithHistory.lrf, is that right?
The directory SmithHistory_files contains all the pictures mentioned in SmithHistory.htm.
I’m running on Windows XP. The document is successfully rendered by Firefox.
Here is the DOS box showing the problem:
C:\Program Files\libprs500>html2lrf SmithHistory.htm
Processing SmithHistory.htm
Parsing HTML... done
Converting to BBeB...Unhandled/malformed CSS key: clear both
Unhandled/malformed CSS key: left 1%
Unhandled/malformed CSS key: vertical-align super
done
Traceback (most recent call last):
File "convert_from.py", line 1407, in <module>
File "convert_from.py", line 1342, in main
File "convert_from.py", line 1202, in process_file
File "convert_from.py", line 556, in process_links
File "encodings\utf_8.pyo", line 16, in decode
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe6' in position 8:
ordinal not in range(128)
C:\Program Files\libprs500>dir
Volume in drive C is ACER
Volume Serial Number is 40A9-98AA
Directory of C:\Program Files\libprs500
29/06/2007 11:42 <DIR> .
29/06/2007 11:42 <DIR> ..
18/04/2007 16:51 77,824 bz2.pyd
04/07/2004 18:34 46,080 clit.exe
29/06/2007 10:57 <DIR> driver
29/06/2007 09:39 61,952 html2lrf.exe
29/06/2007 09:39 122,880 libprs500.exe
29/06/2007 10:57 58 libprs500.url
29/06/2007 09:39 1,919,171 library.zip
29/06/2007 09:39 15,124 LICENSE
29/06/2007 09:39 19,456 lit2lrf.exe
29/06/2007 09:39 38,912 lrf-meta.exe
29/06/2007 09:39 57,344 markdown.exe
13/01/2005 20:19 15,960 mingwm10.dll
12/07/2006 02:35 348,160 MSVCR71.dll
28/06/2007 21:56 28,160 multiarray.pyd
29/06/2007 09:39 28,672 prs500.exe
18/04/2007 16:51 135,168 pyexpat.pyd
18/04/2007 16:51 2,113,536 python25.dll
08/06/2007 06:27 6,656 Qt.pyd
08/06/2007 06:26 1,357,312 QtCore.pyd
03/06/2007 00:03 2,077,696 QtCore4.dll
08/06/2007 06:26 5,682,688 QtGui.pyd
25/05/2007 16:03 9,041,920 QtGui4.dll
08/06/2007 06:27 104,960 QtSvg.pyd
29/06/2007 09:39 411,648 QtSvg4.dll
29/06/2007 09:39 499,200 QtXml4.dll
29/06/2007 09:39 20,992 rtf-meta.exe
18/04/2007 16:51 7,680 select.pyd
03/06/2007 05:44 59,904 sip.pyd
29/06/2007 11:12 1,296,659 SmithHistory.htm
29/06/2007 11:42 <DIR> SmithHistory_files
03/04/2006 16:41 260,096 sqlite3.dll
28/03/2006 01:23 576,000 tcl84.dll
28/03/2006 01:44 1,038,848 tk84.dll
29/06/2007 09:39 19,968 txt2lrf.exe
28/06/2007 21:56 55,296 umath.pyd
18/04/2007 16:51 475,136 unicodedata.pyd
29/06/2007 10:57 127,953 Uninstall.exe
18/04/2007 16:51 4,608 w9xpopen.exe
18/04/2007 16:51 81,920 _ctypes.pyd
18/04/2007 16:52 323,584 _hashlib.pyd
03/12/2006 14:23 319,488 _imaging.pyd
03/12/2006 14:23 290,816 _imagingft.pyd
03/12/2006 14:23 10,752 _imagingmath.pyd
03/12/2006 14:23 5,120 _imagingtk.pyd
28/06/2007 21:56 72,192 _numpy.pyd
18/04/2007 16:52 53,248 _socket.pyd
18/04/2007 16:51 49,152 _sqlite3.pyd
18/04/2007 16:52 655,360 _ssl.pyd
46 File(s) 30,015,309 bytes
4 Dir(s) 18,338,516,992 bytes free
kovidgoyal 06-29-2007, 10:48 AM The windows installer basically installs into a single directory. It is py2exe'd. If you dont want the GUI just delete the libprs500 executable and any Qt/PyQt related things in c:\program files\libprs500 and bundle it into a zip file.
kovidgoyal 06-29-2007, 11:18 AM Released v0.3.62 with a bunch of bug fixes for bugs that crawled in the the previous couple of releases.
kovidgoyal 06-30-2007, 12:38 PM Dear kovidgoyal
Many thanks for your kind reply to my request for html2rlf.
I got the problem below after downloading and saving the html from http://www.gutenberg.org/etext/19694 into your libprs500 directory.
Should be fixed in 0.3.62. Also you don't need to save the files in the libprs500 directory. You can save them in any directory. html2lrf is on your PATH so you can invoke it from any directory.
JSWolf 06-30-2007, 05:01 PM I do have an interesting bug to report. The line spacing in the last line of a paragraph seem to be off. Seems to be more spacing then the other lines in a paragraph.
kovidgoyal 06-30-2007, 06:33 PM Wow you have sharp eyesight, now that you point it out I notice it too. Frankly, I have no idea what's causing it and to me it seems like a pretty marginal effect, so I'm going to ignore it, as I have a little too much on my plate at the moment. If you feel it deserves attention, open a bug report and I'll get around to it...eventually.
JSWolf 06-30-2007, 07:13 PM Wow you have sharp eyesight, now that you point it out I notice it too. Frankly, I have no idea what's causing it and to me it seems like a pretty marginal effect, so I'm going to ignore it, as I have a little too much on my plate at the moment. If you feel it deserves attention, open a bug report and I'll get around to it...eventually.
I'm one of those types that sometimes things just stand out to me. I used to help proofread because I'd see things that just leapt off the page at me. Yes, I'll open a bug report for it later tonight. It just looks odd. What I might do is have a go at converting the LRF into LRS and seeing if I can see what might be causing it. I'll try that before I post the bug report because if I do fine it, then it might be easy enough for you to fix sooner rather then later.
JSWolf 06-30-2007, 10:51 PM I found the source of the bug. It has to do with baselineskip.
kovidgoyal 06-30-2007, 10:57 PM Why would baselineskip affect only the last line of the paragraph? It should affect the entire textblock
JSWolf 07-01-2007, 07:03 AM Why would baselineskip affect only the last line of the paragraph? It should affect the entire textblock
It only happens with an embedded font. Without it seems to be fine.
kovidgoyal 07-01-2007, 11:54 AM It's probably a bug in the LRF rendering software on the reader. I really cannot think of anything in the LRF file that would affect only the last line of a psragraph.
bkilian 07-03-2007, 01:12 AM Is there a recommended or supported way to do drop caps?
I have a word exported HTML file I can send you (It's a book a friend wrote), but html2lrf sticks the drop cap on a line by itself.
I attached images to show what I meant.
kovidgoyal 07-03-2007, 01:20 AM Well you cant acheive an actual drop caps. The best that you can do is an image whoose base is aligned with the base of the line.
something like this should achieve that
<p>
<img src='smallimage.jpg' />his is the first line.
</p>
If the image is larger than a certain size it gets put onto a line by itself, so make sure the image is small enough.
bkilian 07-03-2007, 04:57 PM Well you cant acheive an actual drop caps. The best that you can do is an image whoose base is aligned with the base of the line.
something like this should achieve that
<p>
<img src='smallimage.jpg' />his is the first line.
</p>
If the image is larger than a certain size it gets put onto a line by itself, so make sure the image is small enough.
Ok, I replaced the bizarre table based drop cap with a much simpler larger first letter. I know this is possible, since all the books I bought from the connect store (all 3 of them :)) do it.
<font size="+3">T</font>he road leading out of
Constantinople was dusty, the stones thirsty after
which should look ssomething like
The road leading out of Constantinople was dusty, the stones thirsty after
However, html2lrf just ignored it and kept the first letter of the chapter the same size as all the others. Is this supported?
Edit: Never mind. the trick is to use proper CSS, and not <font size="+3">. It appears to not understand relative font sizes.
kovidgoyal 07-03-2007, 05:01 PM Use <big>T</big>he
kovidgoyal 07-04-2007, 04:32 PM Released v0.3.66 with support for dropcaps, see the attached demo in the first post of this thread.
bkilian 07-09-2007, 02:38 PM I have a suggestion that I think should be fairly easy to implement, and would increase the usefulness of html2lrf significantly (to me, at least :))
I would like a way to essentially "store" the command line options I want to use on a particular book, so that the next time I convert it, I don't have to remember exactly what I used.
Now HTML already has a perfect way to store metadata, so why not store it in the HTML file itself?
I suggest a set of <meta/> variables that you could stick in the <head> of the html file, and html2lrf uses them to determine what it's settings should be.
For example:
<meta name="publisher" content="Baen Books" />
<meta name="title" content="Wind Rider's Oath" />
<meta name="author" content="David Weber" />
<meta name="cover" content="0743488210_Cover.jpg" />
<meta name="font-delta" content="-0.5" />
There's no reason I can think of that any of your command line settings wouldn't work in the <meta> section, so a person could essentially store not only the content, but how to replicate the content correctly in the same file.
Whether you decide this is something you want to implement or not, I'm probably going to start annotating my HTML files in this way. I figure as long as I keep to the same text you use in your command line variables, I have a better than even chance of it just working if you at some point decide to implement it. (Or I could, if I was bothered to learn Python, but I don't really have a weekend free in the near future :))
kovidgoyal 07-09-2007, 02:53 PM An interesting idea, can you tell me why you need this feature?
bkilian 07-09-2007, 04:12 PM An interesting idea, can you tell me why you need this feature?Well, I have a vast (Read: in the hundreds) library of Baen E-books, and I like to keep at least one human readable version around at all times. It used to be the RTF, and I'd read it into Book Designer, spend half an hour or more tweaking it, save it as Book Designer's html0 format (so I didn't have to do the tweaking each time) and then make the LRF.
But Book Designer has it's own set of problems, involving weird character conversions, it's insistence on reformatting my content, and it's insistence on using Windows :) I have converted all my books multiple times as bugs get fixed in book designer.
So when I found out that html2lrf does a quite passable conversion on the .LIT html, (as long as I remove the useless table of contents html first) without me doing a huge amount of searching through the book to make sure it was doing the right thing, I jumped on the chance, only to be stumped by the fact that there's no way my human readable archive can contain all the information needed to perform the conversion correctly.
I essentially want to be able to automate the conversion of a number of titles in one go, and it's impossible to do with your current command line driven method of specifying metadata. Essentially, if you add a feature at some point that would benefit me, I'd like to be able to reconvert all my books without having to do it all manually.
(One other feature that would be helpful in this endeavour would be the ability to specify the sony ebook ID, so I can ensure that it stays constant across multiple conversions)
In a more general sense, the feature could be useful in an automated website that, for instance, adds a dedication to an ebook for the user before packing it up, although this seems a bit contrived :)
kovidgoyal 07-09-2007, 04:22 PM Hmm why not just use a shell script? Save it in the archive with the html files. And save the metadata in an opf file. html2lrf will read the metadata from the opf automatically when you convert, and the commandline settings will be stored in the script file.
bkilian 07-09-2007, 04:39 PM That just increases the number of files I have to keep track of from 1+images to 3+images per book, with the added disadvantage that I then have a script file that is probably not portable across systems, and I have to learn the format of an OPF file.
As opposed to adding 5 or 6 <meta> lines to an html, it seems quite unwieldy.
Oh, it looks like html2lrf converts & #333; (ō) to a space. I'm assuming the sony font doesn't have this character. Is there some way to define a conversion list for special characters?
kovidgoyal 07-09-2007, 04:57 PM I keep my human readable html rar'ed up, that way I don't really care about how many files the archive contains. If you write the script in a cross platform language (like python) you don't have to worry about portability either. And opf is just a simple XML file. Not really anything to learn and its likely to be around for a while as well.
No you can't specify custom character conversion without editing the source code, but you can embed a font that can handle the special characters into the LRF.
bkilian 07-09-2007, 05:18 PM I keep my human readable html rar'ed up, that way I don't really care about how many files the archive contains. If you write the script in a cross platform language (like python) you don't have to worry about portability either. And opf is just a simple XML file. Not really anything to learn and its likely to be around for a while as well.
No you can't specify custom character conversion without editing the source code, but you can embed a font that can handle the special characters into the LRF.That's a bummer. Looks like I'll have to be doing a bunch of find/replace in my files before conversion. Does anyone have a list of the actual characters the sony font understands?
If I was going to bother learning python, I'd just modify html2lrf myself instead of writing script files. ;) In fact I might still do that if I feel energetic in the next few weeks.
Does the OPF file have to be named anything special? if not, do I have to have a different directory for every book? I believe the program can read through zip files, but I haven't been able to make that work (and could the cover image also be in the zip file?)
C:\EBooks\SRC\Weber\Bahzell\t\Wind_Riders_Oath\t>html2lrf Wind_Riders_Oath.zip
Traceback (most recent call last):
File "convert_from.py", line 1406, in <module>
File "convert_from.py", line 1341, in main
File "convert_from.py", line 1141, in process_file
File "convert_from.py", line 1380, in get_path
File "libprs500\__init__.pyo", line 52, in extract
File "libprs500\libunzip.pyo", line 45, in extract
File "os.pyo", line 172, in makedirs
WindowsError: [Error 3] The system cannot find the path specified: ''
kovidgoyal 07-09-2007, 05:31 PM I would recommend against changing the html2lrf source code as you'd have to maintain the change through new versions of libprs500. What would make more sense is to write a wrapper around html2lrf that makes the changes to the html file before calling html2lrf.
Yeah html2lrf supports both zip and rar archives. The opf file doesn't need to be named anything special (as long as it has a .opf extension) and the archive can contain the cover.
That error looks like another windows incompatibility bug. Open bug report and attach the zip file and I'll fix it.
bkilian 07-09-2007, 05:57 PM I would recommend against changing the html2lrf source code as you'd have to maintain the change through new versions of libprs500. What would make more sense is to write a wrapper around html2lrf that makes the changes to the html file before calling html2lrf.
Yeah html2lrf supports both zip and rar archives. The opf file doesn't need to be named anything special (as long as it has a .opf extension) and the archive can contain the cover.
That error looks like another windows incompatibility bug. Open bug report and attach the zip file and I'll fix it.I maintained my modifications to the ircII client through 8 years of changes, libprs500 would be a breeze :)
I do, however take your point. So you don't feel it's valuable to have some way to maintain the same settings for a file across multiple conversions. That's fair, I suspect I'm an extreme case. (I did however just do it again, I reconverted to test something, and forgot to specify --disable-autorotation or --header)
I'll enter a bug for the zipfile problem. Does the html file inside the zip file need to be named anything special?
bkilian 07-09-2007, 06:16 PM Also, do you have any plans on adding definition list parsing?
<dl>
<dt> blah </dt>
<dd> blah 2 </dd>
</dl>
I can convert them into other types of lists, but I'd prefer not to, and if you do plan on adding it, then I'll just wait.
(You can see an example of them used in the html file in the zip I included with that bug report)
kovidgoyal 07-09-2007, 06:35 PM As far as maintaining settings like --header (which I suspect you need over all conversions not on a per file basis) the new GUI will take care of that as it will remember conversion defaults.
Adding support for definition lists is trivial and I'll do it in the next release.
Yeah I don't think it makes sense to add support for per file defaults to html2lrf.
theswede 07-09-2007, 06:52 PM I agree; metatags in the HTML is certainly the way to store metadata. Manual synchronization and scripts is not only a waste of time, but error prone, and it means that when I zip up a few files to take along on my work laptop, the metadata is gone. It should be embedded.
I might code this into html2lrf myself just to be able to do things the right way.
kovidgoyal 07-09-2007, 07:06 PM Ah but opf files are likely to become the standard for ebook metadata.
theswede 07-09-2007, 07:10 PM Then I'll write a tool which extracts them from embedded metadata in the unlikely event that is ever needed. Or embeds them in the header, as meta tags. Which is pretty much what is proposed here.
I will not mess with scripts and extra files. An ebook should, as far as possible, contain all its required metadata. I know me; I will never maintain an external file. It will rot, and I'll end up having to reorganize my books after a year or so. If I embed the metadata, it's *done*, for the lifetime of the ebook.
kovidgoyal 07-09-2007, 07:20 PM erm the epub standard is basically an xhtml file with opf and image files in a zipped container, so it is a single file and since zip is universally understood, you can regard it as pretty much human readable as well.
theswede 07-09-2007, 07:28 PM That's three or more files in a zip archive. I will make the change to the code for my own use, as I do not intend to mess with several files, but keep everything in one file as opposed to an archive. I may also add .gz pipe support, if I can be bothered. And a ~/.html2lrf file which can store defaults, since it's a pain to retype them all the time.
Besides, html2lrf doesn't accept such a zip file either, last I checked, so that doesn't help, really.
That said, I want to thank you for making html2lrf. It's saved me a lot of time and is an excellent tool. If it wasn't, I wouldn't be bugging you about it. ;)
txt2lrf however doesn't work for me. It never finishes. =(
kovidgoyal 07-09-2007, 07:35 PM Well to each his own. If you're having problems with |