kovidgoyal
07-10-2007, 02:20 PM
Released 0.3.67 with support for definition lists and a fix for the handling of zip files.
|
View Full Version : LRF output kovidgoyal 07-10-2007, 02:20 PM Released 0.3.67 with support for definition lists and a fix for the handling of zip files. bkilian 07-10-2007, 04:53 PM Released 0.3.67 with support for definition lists and a fix for the handling of zip files.Dude, you are one of the most responsive devs I've ever seen, and I've seen a lot of devs :) As to the Python regular expression, One I've found that only matches paragraphs containing only an <a id...></a> seems to work on the baen books I've tried it on. <p>\s*<a id.*?>\s*</a>\s*</p> kovidgoyal 07-10-2007, 05:17 PM That's coz I use libprs500 a lot myself and I want it to be as bug free as possible :-) I essentially use all you guys as free bug hunters as bug-hunting is something I'm extremely lazy about. And those two fixes were about 10 lines of code. But aren't those id elements referred to by some links in the rest of the file? bkilian 07-10-2007, 06:07 PM That's coz I use libprs500 a lot myself and I want it to be as bug free as possible :-) I essentially use all you guys as free bug hunters as bug-hunting is something I'm extremely lazy about. And those two fixes were about 10 lines of code. But aren't those id elements referred to by some links in the rest of the file?Not when it's the only element inside a <p>. The regex grabs every <p> that _only_ has a single <a> in it and only if the <a> starts with "id" and has no text (only whitespace between <a> and </a>). I can't think of, nor did I see, any useful use of that particular combination. We can always make it even more picky by requiring the <a> to have no "href", but I don't think it's necessary. Edit: Oh, you're asking if the paragraph indicators (which is what these are) are used by anything else? No. They're used in the "web" reading version to update the silly "index" box. (Check http://www.webscription.net/10.1125/Baen/0671318470/0671318470.htm and move your mouse down the page. You can type a number into the box and it'll jump to that paragraph.) The html in the LIT doesn't have the javascript to enable this. Note that the pure html versions don't have <p> surrounding the <a> elements, so they don't render, it's really only an issue with the files they include in their LIT versions (I suspect the OEB DTD requires the surrounding <p>). kovidgoyal 07-10-2007, 06:14 PM I meant aren't there <a href> elements that refer to that id? So that removing the id would make those links not work. THough I suppose I could just remove the <p> and keep the <a> bkilian 07-10-2007, 06:59 PM No, I don't believe there are any links that refer to the specific paragraphs. Just removing the <p> would be a perfectly reasonable compromise too :) kovidgoyal 07-10-2007, 07:17 PM OK added it to SVN. jgs 07-12-2007, 09:04 PM After a very positive experience with libprs500 and Kovid's patient support, I am seriously considering moving totally over to version .4 in lieu of Connect. Questions: 1. Any guess when .4 will be available? 2. Will "Genre" be one of your metadata elements? Great job Jim JSWolf 07-12-2007, 09:08 PM if you don't use Connect, you cannot purchase books at the Connect store. If that is not an issue, go for it. Of course, if you have multiple computers, you can install Libprs500 on one and Connect on the other as long as both computers run 32-bit Windows. kovidgoyal 07-12-2007, 09:18 PM 1 month < 0.4.0 < 3 months You should use the Tags facility in libprs500 to specify genre metadata. The search bar searches over tags as well. jgs 07-12-2007, 11:48 PM I've got four systems going on five; thanks for the advice. Jim bnocturnal 07-13-2007, 01:19 PM Hello, First off, Thanks for the great program! I seem to have run into a bug... not sure if it is by design, though.. I am converting an ebook (Thinking in Java, from http://mindvew.net), and have run into an issue with html blockquotes. This book uses the block quotes for code samples, and while html2lrf does "block" the text, it doesn't seem to convert any of the html.... what should be this: monitor.expect(new String[] { "1: n1.i: 9, n2.i: 47", "2: n1.i: 47, n2.i: 47", "3: n1.i: 27, n2.i: 27" }); comes out like this: monitor.expect(<font color="#0000ff">new</font> String[] { <font color="#004488">"1: n1.i: 9, n2.i: 47"</font>, <font color="#004488">"2: n1.i: 47, n2.i: 47"</font>, <font color="#004488">"3: n1.i: 27, n2.i: 27"</font> }); I suppose i could come up with some regex to remove the "font" tags... Dave kovidgoyal 07-13-2007, 02:22 PM Post a simple html file that reproduces the problem. Just copy paste one of the blockquote sections along with any css sections/files. bnocturnal 07-13-2007, 02:34 PM Post a simple html file that reproduces the problem. Just copy paste one of the blockquote sections along with any css sections/files. Here you go... I included a section with several blockquotes, and the css, and the .lrf output. Thanks, Dave kovidgoyal 07-13-2007, 03:11 PM Hmm see what you mean...two line fix will be in next release. bnocturnal 07-13-2007, 03:57 PM Hmm see what you mean...two line fix will be in next release. Great! Thanks! Dave yonkz 07-14-2007, 03:14 PM Hi, I'm using the command line version of html2lrf, and I'm trying to use the --chapter-regex to build the toc, but it doesn't behave as I expect it to. Other than that I'm loving what I've seen so far. Perhaps I am doing something wrong. Here is the command I am using: C:\html2lrf>html2lrf -a "Author" -t "Title" --chapter-regex="<I>(Chapter [1-9]+[,].*)</I>" --font-delta=-2 "c:\ebooks\MyBook.htm" The regex I'm using here works in other situations, but I'm not familiar with the implementation here, so perhaps the syntax should be different. The Chapter headings in the html doc look like this: <I>Chapter 1, Title of Chapter 1</I> Any help would be appreciated. kovidgoyal 07-14-2007, 03:47 PM Released v0.3.72: 1) Support for nested lists 2) Bug fixes kovidgoyal 07-14-2007, 03:48 PM You shouldn't have the <I> tags in the regex, it matches on the contents of the tag. Infact the default regex should work for you. yonkz 07-14-2007, 04:11 PM :)Wow, fast response - kovidgoyal! OK, I reread the quick little blurb about the chapter regex from the help, and came to a realization of what the problem was. The chapter regex only looks for matches inside a <H1> or other H tag, so since these were not inside H tags, they weren't found. What I had to do was use another program (just whipped up a quick c# app) to wrap the matches in H1 tags, and save a copy of the file with the new tags, than re-ran the cmd line utility, and all was well. Do you think it would be possible to optionally remove the requirement for the chapter regex match to look for the H tags, and just use the supplied regular expression to find the chapter headings? Thanks yonkz 07-14-2007, 04:15 PM Too quick, there. It did detect the chapters according to the cmd line output, but I guess it doesn't actually build a TOC with that info. When I open the output lrf - the TOC doesn't have the chapters in it. Is there a way to build the TOC and have it show up in the lrf? kovidgoyal 07-14-2007, 04:52 PM All links in the top level html file are put into the TOC. Chuck Eglinton 07-15-2007, 02:33 PM HTML2LRF is an excellent program but sometims chokes on Unicode Character 'RIGHT SINGLE QUOTATION MARK' (U+2019) (also known as the apostrophe) I receive the error on some HTML files created by MSWord 2000. Also.... First chapter in Table of Contents (TOC) doesn't appear in TOC list on Reader or in Connect software. When saving a document as HTML with Microsoft Word 2000, HTML2LRF correctly *reports* all the chapters being found in the table of contents (that is, "Detected Chapter..." shows all chapters correctly when compiling... all the text defined as "Header1") However, even though HTML2LRF displays all the chapters being found, the very first chapter is not displayed in the Table of Contents (TOC) when opening the .LRF file in the SONY CONNECT Desktop software or the reader. I'm able to work around it - HTML2LRF is still a great program FangornUK 07-15-2007, 02:39 PM MS Word 2000 creates dreadful HTML. Clean it up with "tidy --wrap 0 --word-2000 yes" from the program at http://tidy.sourceforge.net/ and see what happens then. kovidgoyal 07-15-2007, 03:40 PM HTML2LRF is an excellent program but sometims chokes on Unicode Character 'RIGHT SINGLE QUOTATION MARK' (U+2019) (also known as the apostrophe) The only thing that html2lrf does when it detects a chapter heading is insert a page break before it. It does not insert detected chapters into a TOC. This is because it uses links i.e. <a href> tags to build the TOC. Every (almost) <a href> element gets put into the TOC. Send me an example html file that causes the unicode problem. JSWolf 07-15-2007, 09:35 PM I tried embedding the font Arial Narrow and I got Bold and Italics instead of normal where I should have. When it called for italics, I did get the correct italics version. kovidgoyal 07-15-2007, 09:44 PM Yeah with most fonts for some reason I cant figure out the LRF renderer chooses the bold as the normal even though the fonts are correctly embedded. JSWolf 07-15-2007, 10:07 PM I removed the bold italic font and ended up with the italic version. kovidgoyal 07-15-2007, 10:13 PM You mean you removed it for the windows\fonts directory? JSWolf 07-16-2007, 06:26 AM You mean you removed it for the windows\fonts directory? Yeah, I copied the font out of the Windows font directory so it could not be used. After I gave up on Arial Narrow, I put the font back. edbro 07-16-2007, 12:02 PM Is there a way for the default to save to the same directory as the input file? Vista does not allow for the files to be saved to "program files". It will save any such files to a virtual store which is buried in multiple subdirectories and hard to find. I would love for it to save automatically to the same directory as the source file. Yes, I know I can input an output directory from the command line but, I'd love for this to be a default. JSWolf 07-16-2007, 12:07 PM Is there a way for the default to save to the same directory as the input file? Vista does not allow for the files to be saved to "program files". It will save any such files to a virtual store which is buried in multiple subdirectories and hard to find. I would love for it to save automatically to the same directory as the source file. Yes, I know I can input an output directory from the command line but, I'd love for this to be a default. You do not need the source in the same directory as the program. So you can put the source in some other driectoory and the default of saving in the same directory will work fine. edbro 07-16-2007, 12:12 PM You do not need the source in the same directory as the program. So you can put the source in some other driectoory and the default of saving in the same directory will work fine. Not from my experience. I have Libprs500 in "C:\program files". I have the source in "D:\work". The command html2lrf D:\work\demo.html saves the output to "C:\users\ed\appdata\local\virtualstore\program files\libprs500\demo.lrf" JSWolf 07-16-2007, 01:31 PM 1. Start | Run | cmd 2. d: 3. cd \work\demo 4. html2lrf demo.html That is all you need to do and demo.lrf will be in the same directory as demo.html. JSWolf 07-16-2007, 01:32 PM Would it be possible to have an option to control the paragraph indent? Some HTML are fine as is and some are too small. I would like if possible a default of 4 spaces for an indent if that's OK. edbro 07-16-2007, 01:36 PM 1. Start | Run | cmd 2. d: 3. cd \work\demo 4. html2lrf demo.html That is all you need to do and demo.lrf will be in the same directory as demo.html. Thank you, I was assuming that I had to be in the libprs directory to access the executable. bkilian 07-16-2007, 06:48 PM Oh, I noticed a problem with the code you added to remove the extra <p> in Baen books, I tried to work out why it was happening, and in looking at your source I think I worked it out. In your BAEN processing stuff, it looks like you have code that strips out the <a id="pXXX"> lines right before you try to strip out the whole <p> including them. I assume the second one never works since the first one has already modified the lines. kovidgoyal 07-16-2007, 07:18 PM Oh, I noticed a problem with the code you added to remove the extra <p> in Baen books, I tried to work out why it was happening, and in looking at your source I think I worked it out. In your BAEN processing stuff, it looks like you have code that strips out the <a id="pXXX"> lines right before you try to strip out the whole <p> including them. I assume the second one never works since the first one has already modified the lines. Good catch, I've re-ordered them. kovidgoyal 07-16-2007, 11:24 PM Would it be possible to have an option to control the paragraph indent? Some HTML are fine as is and some are too small. I would like if possible a default of 4 spaces for an indent if that's OK. The CSS option text-indent controls this. bkilian 07-17-2007, 02:58 PM Is there some method for forcing page breaks other than using a regex? My most recent baen purchase is doing weird things at the page breaks. (Especially since the CSS specifically tells it to page-break-before all H1 tags) One such case is this: <h1 align="center"> <a name="Chap_2"> </a> <b>MAY, YEAR OF GOD 890</b> </h1> <h2 align="center"> <b>I</b> </h2> In that case, the page break happens between the H1 and the H2, which is pretty weird. Is there some tag we can use to force a page break? Or is there some way we can tell it that <h1> always signifies a new chapter, irrespective of it's contents? kovidgoyal 07-17-2007, 04:14 PM Look at the CHAPTER OPTIONS help text in the html2lrf help. Bokeh 07-20-2007, 12:18 PM Hello, I am a bit of a newb at this. I was using the "old" html2LRF program for a long time this morning, converting html files from gaslight and it was working great- until it suddenly decided to stop working with no error messages or explanations. So I found this thread and downloaded this newer program. I tested it on a small html file from gaslight, and it seemed to work, until i tried to bring the new .lrf file into the "CONNECT Reader" program's library. Any time I tried to import the file, the connect reader program would exit without any warning. I tried some more html files with the same result. Any ideas? Am I doing something wrong? edit: just used it on a gutenberg zipped html file and it worked perfectly... so maybe there is a problem with the gaslight texts? Here is one of the gaslight files i tried to convert and couldn't read properly: link (http://gaslight.mtroyal.ab.ca/gaslight/strngild.htm) kovidgoyal 07-20-2007, 01:03 PM That's because the entire text is inside a single table cell. The old HTML2LRF simply ignored table markup the new one processes it, incorrectly in this case. I'll upload a fixed version soon. Bokeh 07-20-2007, 01:36 PM great, thanks! And thanks in general for making such an awesome program! kovidgoyal 07-20-2007, 03:26 PM Released v0.3.78 with --ignore-tables option. @Bokeh Use the commandline html2lrf --ignore-tables gaslight.htm bkilian 07-22-2007, 03:42 AM hehe, back again... I'm trying to indent an entire paragraph (not just the first line), something like what <blockquote> does. I've tried margin-left and margin-right, and padding-left and padding-right, both do the right thing in my browser, but html2lrf ignores both of them. Is there something I should be using for this? kovidgoyal 07-22-2007, 04:40 AM html2lrf doesn't support block level indentation other than through <blockquote> as I didn't really see any need for it. classicspam 08-01-2007, 09:59 PM I know it may seem to be a waste of time, however would it be feasible to add an option in the future to put a new line (blank) in between paragraphs? kovidgoyal 08-01-2007, 10:05 PM open a ticket and I'll get around to it. JSWolf 08-01-2007, 11:19 PM open a ticket and I'll get around to it. Please make sure it's gotten around to after version 10.9. kovidgoyal 08-01-2007, 11:32 PM You have my word. RWood 08-01-2007, 11:35 PM Please make sure it's gotten around to after version 10.9. Jon, cut him some slack, maybe version 22.07.11. Blank lines after paragraphs are not that easy to generate. :D JSWolf 08-02-2007, 01:20 AM Jon, cut him some slack, maybe version 22.07.11. Blank lines after paragraphs are not that easy to generate. :D Oh wait, that's a MobiPocket format feature. LRF doesn't do extra blank lines after paragraphs unless it's part of the book. astra 08-02-2007, 04:29 PM Do I understand it right, that I can use lib500prs windows installer which has GUI interface instead of Connect software? I am going on holiday in a few days and I am taking my laptop with me. It does NOT have Connect software atm but I might want to charge my reader via USB, so I need a software that would recognise the reader. If I install lib500prs and it will allow me to charge the reader I would like to install it. Good opportunity to explore the program without messing Connect software. kovidgoyal 08-02-2007, 05:49 PM Yeah it will allow you to charge your reader. astra 08-02-2007, 05:55 PM Thanks. I will install it tomorrow evening :) kovidgoyal 08-02-2007, 11:13 PM v 0.3.83 is on its way to the servers: 1) --blank-after-para 2) bug fixes 3) internal re-organization that may have introduced some regressions 4) increased default para indent flyneo 08-05-2007, 08:18 PM Hi, I used this tool to convert a big HTML file (30M with images). However, I got the following error after a while: Processing 0-201-63354-X_Joined.htm Parsing HTML... done Converting to BBeB...Unhandled/malformed CSS key: text-align left Traceback (most recent call last): File "convert_from.py", line 1453, in <module> File "convert_from.py", line 1387, in main File "convert_from.py", line 1288, in process_file File "convert_from.py", line 377, in __init__ File "convert_from.py", line 461, in parse_file File "convert_from.py", line 672, in process_children File "convert_from.py", line 1196, in parse_tag File "convert_from.py", line 672, in process_children File "convert_from.py", line 1196, in parse_tag File "convert_from.py", line 672, in process_children File "convert_from.py", line 1194, in parse_tag File "convert_from.py", line 1205, in process_table File "libprs500\ebooks\lrf\html\table.pyo", line 359, in blocks IndexError: list assignment index out of range Could you help me with that? Thanks! kovidgoyal 08-05-2007, 09:25 PM Attach the html file or send it to me. flyneo 08-06-2007, 12:34 AM Attach the html file or send it to me. Thanks for the quick reply. Please see the attachment for the file that caused the problem. This is the main file I passed to html2lrf. There are some additional files which I can send to you if needed. Basically what I was doing is convert a tech book which is in chm format into a lrt file. There are more than 1000 pages and hundreds of images and tables with the chm book. I firstly decompiled it using 'hh' to a bunch of html files and images. Then I joined these html files using BookDesigner into a few files. And passed the joined html file to html2lrf converter. Before doing this, I also tried to pass the original html files directly to html2lrf without joining. But the program ran for about an hour then exit with errors. Again thank you for the help! kovidgoyal 08-06-2007, 12:48 AM This file seems to use tables for formatting included nested tables.html2lrf supports only simple tables. Try using the --ignore-tables option. flyneo 08-06-2007, 01:22 AM It works great now. Thanks a lot! Valdhor 08-08-2007, 12:32 PM If you have CSS declared for <pre> tags, you will get errors. HTML File: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <HTML> <HEAD> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> <title>Bug Testing</title> <link rel="stylesheet" href="bugtest.css" type="text/css" /> </head> <body> <pre>Some Stuff</pre> </body> </html> CSS File (bugtest.css): pre { border: solid 1px #444; background-color: #e0e0e0; padding: 0.1in; margin: 0.2in; clear: right; } This will produce the following errors: Unhandled/malformed CSS key: border solid 1px #444 Unhandled/malformed CSS key: clear right Unhandled/malformed CSS key: margin 0.2in Unhandled/malformed CSS key: background-color #e0e0e0 Unhandled/malformed CSS key: text-align left Valdhor 08-08-2007, 12:43 PM I keep getting the error WARNING: An error occurred while processing a table: list assignment index out of range Ignoring table markup Unfortunately, this is a 3.2MB file that has been generated from multiple (Read hundreds) HTML files into one big file. What would be useful is something to the effect... WARNING: An error occurred while processing a table: list assignment index out of range at line xxx near sometextthatwasreadnearwheretheerroroccurred Ignoring table markup so that I can figure out where the problem is and fix it. kovidgoyal 08-08-2007, 12:55 PM The CSS messages are not error they just indicate that those CSS properties have been ignored. The tables error message is most likely generated because of a complex and/or nested table. I'll output the first hundred characters of the table tag in the next version, but line numbers are not going to be possible because of the way the html parser is designed. Valdhor 08-08-2007, 01:42 PM Thank you. That will make it much easier to find where the error is. I have been looking for days trying to find a complex or nested table but with over 4700 lines and lots of small tables it's gonna take a while :tired: Valdhor 08-08-2007, 01:43 PM sorry, that should have been 47000 lines. kovidgoyal 08-08-2007, 01:53 PM why dont you run html2lrf on the individual files? that way you can narrow down the search a lot. Valdhor 08-08-2007, 03:16 PM Well, that would have worked :o but I figured out where the problem was - rowspan="2". kovidgoyal 08-08-2007, 03:49 PM Hmm was that in a relatively simple table? Because if it was it's a bug in html2lrf and i'd like you to send me the html file so I can fix it. Valdhor 08-09-2007, 10:05 AM Basically, there are a whole bunch of these type constructs which are small tips... <div class="note"><table border="0" summary="Note"> <tr> <td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.gif"></td> <th align="left">Note</th> </tr> <tr><td align="left" valign="top"><p> When specifying multiple parents for a Role, keep in mind that the last parent listed is the first one searched for rules applicable to an authorization query. </p></td></tr> </table></div> If you remove the rowspan="2" then html2lrf works fine. I will email you the entire file separately. kovidgoyal 08-09-2007, 11:10 AM thanks it wasn't rowspan=2 it was this table in particular that's causing the problem: <div class="note"><table border="0" summary="Note"> <tr> <td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.gif"></td> <th align="left">Note</th> </tr> <tr><td align="left" valign="top"><p> When specifying multiple parents for a Role, keep in mind that the last parent listed is the first one searched for rules applicable to an authorization query. </p></td></tr> </table></div> DNel 08-09-2007, 09:24 PM I just downloaded the libprs500.exe windows installer. In the installation I chose not to install the drivers as I don't have my eReader yet and don't want to lose the ability to use the Connect. After installing, I clicked on the libprs500 icon to launch the program and got the following error message in the associated log Traceback (most recent call last): File "main.py", line 25, in <module> File "libprs500\devices\prs500\driver.pyo", line 56, in <module> File "libprs500\devices\libusb.pyo", line 32, in <module> File "libprs500\__init__.pyo", line 50, in load_library File "ctypes\__init__.pyo", line 423, in LoadLibrary File "ctypes\__init__.pyo", line 340, in __init__ WindowsError: [Error 126] The specified module could not be found Any ideas? I uninstalled and then did a reinstall and got the same error. Is it because I unchecked the driver? Thanks DNel kovidgoyal 08-09-2007, 09:46 PM yeah its a bug. I'll fix it in the next release. You can install the drivers and then uninstall libprs500 and the connect software wont be affected. DNel 08-10-2007, 02:10 PM Thanks, I was able to run libprs500 after I reinstalled with the drivers. However, being completely illiterate when it comes to using it I see that there are icons for add books to library, delete books, edit meta information. After adding the HTML file to library, I don't see how to convert it to LRF. THere is also no place to enter command lines. One of the previous posts mentioned the command line program was in the libprs500 folder. I only see libprs500 (the gui interface) uninstall, and a link to your web page. How do I convert it now to LRF format? Thanks DNel Sorry for being so computer illiterate:tired: kovidgoyal 08-10-2007, 02:17 PM Ah yes the conversion tools have not yet been integrated with the GUI. At the moment what you need to do is the following: 1) Open a command prompt (Start->run->cmd.exe) 2) change to the folder with the html file (cd "c:\myfolder") 3) html2lrf "myfile.html" As I get the time I will integrate the conversion tools into the GUI, but that's going to take a while. DNel 08-10-2007, 03:46 PM Thanks, I tried it on a sample web page and it work great. DNel JSWolf 08-11-2007, 06:53 PM I can't find any way to generate left justified text with html2lrf. Everything ends up fully justified even if I specify align="left" or style="text-align:left". Please, please, please add a way to generate left justified text! Thanks P.S. copied since it does belong here. JSWolf 08-11-2007, 06:55 PM I can't find any way to generate left justified text with html2lrf. Everything ends up fully justified even if I specify align="left" or style="text-align:left". Please, please, please add a way to generate left justified text! Thanks You cannot have left justified text unless you put in the hard returns. It's just the way LRF for the Sony Reader works. So get to hitting return. Excalibur 08-11-2007, 08:08 PM Hey, just wanted to say thanks for a decent set of conversion utilities. I did, however, want to point out that you should be reading the .opf file instead of the .html file that is generated with ConvertLit. I noticed that you have probably hardcoded the .htm extension into the converter, bad practice. The .opf has all the file names necessary for the content, covers, etc. and you should instead focus on that while still not hard coding file extensions. The reason I suggest this? Some of the LIT files I have have text instead of HTML as the body of the book and your tool craps out when it doesn't find an HTML file. The OPF, which is always parsed and output by ConvertLit consistently uses the id of "content" for the main body. You could grab that file name instead of relying on an HTML file being generated. It'd make the tool a bit more stable. :) Just my opinion. Do you know where the format specs for LIT and LRF can be found? I would like to take a look at how the files are set up... Also, a feature request: Could you set it up so that we could make CSS style changes from the commandline as well? For instance, setting text-indent for <p> tags. This would make the books look oh so much nicer with some indentation going on :) kovidgoyal 08-11-2007, 08:19 PM Yeah in an ideal world it would. Since I've never come across a lit file with txt in it, it isn't a priority. If you need that feature feel free to submit a patch on the bug report https://libprs500.kovidgoyal.net/ticket/126 Incidentally, lit2lrf does read the opf file, for the metadata. Excalibur 08-11-2007, 08:22 PM Cool, cool. I believe you should be using the OPF file for just about everything since it is, in essence, the .CUE file of the .OPF's .BIN. I'll add that bug later tonight from work :) I see you monitor your threads closely, good on ya! It's great to see a dev so on-the-ball :) Keep up the great work, btw! kovidgoyal 08-11-2007, 10:40 PM You cannot have left justified text unless you put in the hard returns. It's just the way LRF for the Sony Reader works. So get to hitting return. Somebody needs to bug SONY to fix the LRF renderer on the reader to not right justify text that is left aligned. I suspect they did that because the engineers developing the renderer were japanese and didn't realize that right justification is bad for english text. Excalibur 08-12-2007, 01:30 AM Hey, put in the .txt glitch (I'd say it isn't a full-blown bug since it can be edited by hand and html2rlf run on it, but that defeats the purpose of lit2lrf, doesn't it? :) and also added the CSS from the commandline as an enhancement. I added possible solutions as well, to help you out. Maybe I could learn some python or convert it to java or something... *ponder* who knows, either way, hope that helped you out :) And I'd actually love to see the proprietary formats go away and just use HTML. Perhaps a stripped down Gecko HTML engine in place of the LRF engine that's made to just display HTML & CSS without all the added junk? Maybe keep the plugin enhancement system so that you could add display of LIT or other proprietary formats (having been converted to HTML with the plugin...). Now THAT would be something to see :) kovidgoyal 08-12-2007, 02:52 AM google the epub format its an open html based format that will be eventually supported on the reader . beartard 08-12-2007, 12:01 PM google the epub format its an open html based format that will be eventually supported on the reader . Wait. Did you say the reader is getting additional format support? :pray: igorsk 08-12-2007, 12:48 PM http://www.adobe.com/aboutadobe/pressroom/pressreleases/200706/061907DigitalEditions.html "In addition, with versions for mobile platforms and reading devices also planned, Sony has committed to embed Adobe Digital Editions technology into its portable reader product line." However, I wouldn't expect it before we see Linux version of Digital Editions, and we don't know if the support will make it into PRS-500 and not just the next generation. bluelight 08-13-2007, 11:03 AM i seriously, seriously have no idea how to use this converting gui. Is there a guide of dummies anywhere? cause everybody seems to know what they are doing :( kovidgoyal 08-13-2007, 11:10 AM its not a GUI. Its a commandline app (at least for the moment). Start -> Run -> cmd.exe cd c:\path\to\your\folder html2lrf yourbook.html bluelight 08-13-2007, 11:22 AM its not a GUI. Its a commandline app (at least for the moment). Start -> Run -> cmd.exe cd c:\path\to\your\folder html2lrf yourbook.html ok.. then what? i did that and it didn't convert yet :( kovidgoyal 08-13-2007, 11:45 AM That's it the lrf file will be in that directory. bluelight 08-13-2007, 12:02 PM didn't show up. am i supposed to tweak it after all that scrolled options? HarryT 08-13-2007, 12:05 PM Are you running it from the folder containing your HTML file? bluelight 08-13-2007, 12:07 PM Are you running it from the folder containing your HTML file? yup :blink: kovidgoyal 08-13-2007, 12:13 PM if the options are showing that means you did not specify the file name correctly. If the filename has spaces in it you need to enclose it with quotes bluelight 08-13-2007, 01:02 PM omg it worked. thank you so much XD _Sin 08-15-2007, 12:42 PM Hi. This tool is extremely useful - if it didn't exist, I'd have to write it myself! However I've found a slight problem - it seems to not want to load images if the file extension isn't set correctly on them. I'm converting Mobipocket files (not being in the US I have to either blag credit for the Connect store or buy my books elsewhere...) and the images don't come with proper names. Right now I just spit out extentionless files matching the IDs in the file, but although they view fine in a IE, converting them to LRF loses the images. If I manually fix up the filenames and links to include the image extension, it works. I could write some smarter post-processing stuff at this end, but is there a simple fix for html2lrf that would get around this? kovidgoyal 08-15-2007, 12:52 PM Are the images embedded using <img> tags or <a> tags? _Sin 08-15-2007, 01:01 PM Are the images embedded using <img> tags or <a> tags? So far I've only seen <img> tags, they don't seem to be used for linking very often. kovidgoyal 08-15-2007, 01:07 PM In that case I don't see why they should be omitted. Send me some example files with the image files as well. _Sin 08-15-2007, 01:11 PM In that case I don't see why they should be omitted. Send me some example files with the image files as well. Ok, I'll need to try to construct an example as the ones I'm actually using are obviously copyright... JSWolf 08-15-2007, 01:36 PM I'm converting Mobipocket files Are you converting MobiPocket files with DRM? If so, I'd love to know how. _Sin 08-15-2007, 01:52 PM Are you converting MobiPocket files with DRM? If so, I'd love to know how. Yup. The DRM isn't too tricky to decode. However I'm probably on slightly dodgy territory if I say too much about that - I'm only doing it for my own use on files I own, and I have no plans to distribute the tools I've written to strip the DRM out. _Sin 08-15-2007, 02:39 PM In that case I don't see why they should be omitted. Send me some example files with the image files as well. Ok, here's a simple example of the problem - I've included the generated LRF file as well as the html source and image files... kovidgoyal 08-15-2007, 02:55 PM Fixed in svn. angelyne 08-15-2007, 10:22 PM I really enjoy this tool, I've been using it more and more. I've even started to learn a little html so I could edit the books more to my liking before converting them into BBeB. There is one thing I can't seem to do however, and that's due to my poor skill with html. I'd like to be able to center text vertically on a page. Can this be done easily? I searched but only ended up more confused. I figured that by asking here, I'd get an answer that's appropriate to the BBeB format. Sorry to ask what is probably a really newbie-ish question. kovidgoyal 08-15-2007, 10:34 PM Glad to hear it. No you cannot center text vertically on a page. As far as I know the LRF format doesn't have any support for doing this in a reflowable fashion. A workaround you could use is inserting a few blank lines above the text and force a page break before and after. Pode 08-16-2007, 06:12 AM I'm surprised by the two last comments. I read a lot of webpages on the Sony Reader with the help of your tool, html2lrf. Some of the pages I save on my hardrive for processing by html2lrf aren't properly edited (since usually I strip everything that's not useful from the page with a little bookmarklet, CSS and page style are messed up), and I end up several time with paragraph that are center vertically on the page, (justified on the left and right, but the first and last line are centered vertically). JSWolf 08-16-2007, 09:37 AM How can one make hanging indents with html2lrf? kovidgoyal 08-16-2007, 10:47 AM I'm surprised by the two last comments. I read a lot of webpages on the Sony Reader with the help of your tool, html2lrf. Some of the pages I save on my hardrive for processing by html2lrf aren't properly edited (since usually I strip everything that's not useful from the page with a little bookmarklet, CSS and page style are messed up), and I end up several time with paragraph that are center vertically on the page, (justified on the left and right, but the first and last line are centered vertically). Interesting can you send me an example? kovidgoyal 08-16-2007, 10:47 AM How can one make hanging indents with html2lrf? What's a hanging indent? EDIT: If it means what I think it means, this will do the trick, provided each logical line is at most one physical line long: <p style="text-indent:0pt">First line</p> <p style="text-indent:30pt">Second line <p style="text-indent:30pt">Third line</p> JSWolf 08-16-2007, 09:49 PM A hanging indent is a paragraph where the first line sticks out father then the rest of the lines in the paragraph. Your sample won't work properly as it disables proper word wrap. It makes seperate lines instead of a proper paragraph. Take the above paragraph and move the rest of the text so it starts under the first g in hanging and you have it. Also it should be a proper paragraph and not seperate lines. So when you resize the font, it stays a hanging indent. kovidgoyal 08-16-2007, 09:54 PM sticks out to the left or the right? JSWolf 08-16-2007, 09:58 PM sticks out to the left or the right? To the left. Here is a link that defines a hanging indent... http://www.webopedia.com/TERM/H/hanging_indent.html kovidgoyal 08-16-2007, 10:13 PM yeah to have this ability html2lrf would need to support the margin css property, which I'm insufficiently motivated to support. JSWolf 08-16-2007, 10:16 PM So basically then there will not be any proper way to format some poetry or scripts. That's too bad. kovidgoyal 08-16-2007, 10:18 PM I dont see why poetry requires hanging indents. Indeed I've never read any poetry that has hanging indents. JSWolf 08-16-2007, 10:22 PM I dont see why poetry requires hanging indents. Indeed I've never read any poetry that has hanging indents. LaughingVulcan said (in another thread) that he's trying to format some poetry with hanging indents. kovidgoyal 08-16-2007, 10:23 PM Well he can just leave out the hanging indents, the poetry wont suffer too much. LaughingVulcan 08-16-2007, 10:24 PM Mmmm.... that's what I was afraid of in my thread. While not an absolute for poetry formatting, it is the way that I *strongly* prefer to read it. There is a (poor) example of hanging indent in html here (http://goer.org/HTML/intermediate/align_and_indent/). For those not reading the link, I'll share the following example from my other thread: ("fakey" margins of reader below) *---*---*---*---*---*---* The quick brown fox jumps over the lazy dog. And the sneeze in the breeze upsets the bees. I would want to have become: *---*---*---*---*---*---* The quick brown fox jumps over the lazy dog. And the sneeze in the breeze upsets the bees. Instead of *---*---*---*---*---*---* The quick brown fox jumps over the lazy dog. And the sneeze in the breeze upsets the bees. As I mentioned there, this can be achieved with the (laborious) process of inserting line breaks manually, but the breaks won't work right when the size is upped to Medium and then Large. (In fact, it gets virtually unreadable on upping the size.) Thanks JS for passing it over to this thread... ;) And thank you, Kovid, for HTML2LRF and the rest of the apps. It still rocks! :D kovidgoyal 08-16-2007, 10:32 PM Try inserting blank lines between sentences to improve readability. JSWolf 08-16-2007, 10:34 PM Try inserting blank lines between sentences to improve readability. I think that'll just make it a mess to read. I would not want to read it like that. kovidgoyal 08-16-2007, 11:13 PM Version 0.3.96 is on its way to the servers: 1) Completely refactored to optimize memory usage. Hopefully this hasn't introduced too many new bugs. 2) Added support for <sup> and <sub> 3) Fixed handling of text-indent (should make the indent correct in lit files) 4) Various minor bug fixes. irishjew 08-16-2007, 11:46 PM I first just want to say that Kovid is God. Secondly, I apologize if this has already been touched on, but I have a strange problem, and I'm a bit of a noob. I found one html file that had internal links for each chapter, and I tried to copy that code into a book of short stories I'm converting to lrf in order to make a table of contents. The links work just fine in html, but when I convert the file to lrf, they are nowhere to be found. Does anyone have an idea of what might cause that? kovidgoyal 08-16-2007, 11:55 PM Run it with the --verbose switch irishjew 08-17-2007, 12:19 AM Here's what I get. Looks pretty straightforward. C:\Program Files\libprs500>html2lrf --verbose "C:\Program Files\libprs500\a.html" [INFO] convert_from.py:391: Processing a.html Parsing HTML... [INFO] convert_from.py:405: Converting to BBeB... [INFO] convert_from.py:1374: Output written to C:\Program Files\libprs500\a.lrf kovidgoyal 08-17-2007, 12:25 AM send me the file. irishjew 08-17-2007, 12:48 AM Well, in the interest of not violating copyright laws, I'm attaching a test html file that I have the same problem with. If you open the html file, you'll see the link there working just fine, but the link doesn't appear in the lrf file. kovidgoyal 08-17-2007, 12:57 AM its in a <pre> tag. irishjew 08-17-2007, 01:09 AM What do you mean? Sorry, I'm a noob. I appreciate all this, by the way. kovidgoyal 08-17-2007, 01:21 AM the <a href> tag is inside a <pre> tag. Delete the <pre> tag and you'll be fine. Excalibur 08-17-2007, 02:11 AM That's an incorrect way of thinking about it, however. The <pre> tag is meant to hold other things within it like a DIV tag. Now, if you've supported the CSS property "white-space" that allows for text to be preformatted, then that's cool: white-space: pre; <span><div><pre><p> and a few other tags are meant to hold links and other things within them. If you support padding and text-indent, you might be able to try this: <html> <head> <style> p { text-indent:-30px; padding-left:30px; } </style> </head> <body> <p>Curabitur gravida imperdiet nunc. Vestibulum elementum, velit id porttitor viverra, mi dolor suscipit eros, id consequat justo magna a ante. Maecenas varius eleifend nunc. Cras sed tortor. Phasellus dignissim, erat sit amet ullamcorper vehicula, lorem mi faucibus sapien, eget imperdiet ligula odio nec est. Vivamus venenatis, velit in interdum blandit, ligula mi pulvinar leo, at mollis purus magna fermentum mauris. Phasellus et purus. Suspendisse potenti. Aenean egestas consectetuer enim. Morbi elit justo, scelerisque lobortis, ornare ac, tincidunt eget, orci. Mauris faucibus ornare sem. Praesent nisi arcu, malesuada non, nonummy sit amet, sollicitudin ac, elit. Mauris nec libero id lectus porta tincidunt. Morbi purus est, gravida eget, cursus ut, ultricies quis, sapien.</p> </body> </html> Which produces a very nice hanging indent (of 30px anyway). kovidgoyal 08-17-2007, 02:24 AM Yeah it's only <pre> that doesn't support links and that's because of a design decision I made when first writing html2lrf. Not supporting white-space made the code considerably simpler. In most html files it's more important to get the whitespace right than to support links in a <pre>. As for the hanging indent, unfortunately SONY's LRF renderer doesn't support negative indents, AFAICT it treats them as zero. EDIT: I was wrong, LRF can actually do hanging indents. Excalibur 08-17-2007, 02:36 AM white space in html is ignored UNLESS you use <pre> which supports links. Supporting links with or without whitespace shouldn't be any different. Now, if that white space is designated as non-wrapping whitespace such as when you use the entity, that *might* be an issue, though the reader should automagically wrap it using it's own algorithms. What's the difference between keeping track of the white space and not with your algorithms? kovidgoyal 08-17-2007, 02:52 AM the problem is that in ordinary html multiple consecutive whitespace elements are collapsed into a single white space elemnt, while this is not true in LRF. So I have little checks that do this manually at multiple points in the code. I suppose I could hunt down every instance and wrap it in a if..else but that's a pain, and would require extensive debugging to make sure it doesn't break other behavior. kovidgoyal 08-17-2007, 03:37 AM Released v0.3.97 with support for the padding CCS attribute. This makes LaughingVulcan's hanging indents possible. See the new demo in the first post. It'll reach the servers in ~30min. Excalibur 08-17-2007, 08:29 AM Can you use regular expressions in Python? That would eliminate the need for hunting down all the whitespace or wrapping an if-then around stuff. You'd just have to check to see if you were in a preformatted space such as with the white-space:pre css declaration or the <pre> tag. If you were, you ignored the collapsing of white space. If not, just change white space within tags to one (div, p, etc -- the block elements) using a regular expression search & replace... Just an idea. Good going with the addition of hanging indent! :) kovidgoyal 08-17-2007, 09:18 AM It isn't that simple, what about the situation <p>one <span> two</span> Excalibur 08-17-2007, 09:45 AM What about the situation? That would produce two spaces, that's how it renders in Firefox and in IE. Though if you had: <p>one <span> two</span> which is 2 spaces between the <span> and two, it would reduce to 1 space. HTML creates a tree of its elements (Which I'm sure you're aware of). <p> "one" <span> " two" </span> Since almost all browsers reduce extra whitespace down to 1 within any given tag, except those that are designated as white-space:pre or within <pre> tags, then it would be safe for you to eliminate all extra whitespace that exists within individual tags. Though, if that whitespace happens to already be 1 space in size then there's no reason to eliminate it. That's how the browsers do it. If you wish to faithfully reproduce the same look as in the browsers, then do what they do. Instances where you want a single space between one and two above are going to drive you nuts. And I'd say, unless there is style information tied to a tag that you ignore the tag. Since, in this case, unless there is a specific style attributed to the <span> tag, I'd simply ignore it since it's not doing anything. kovidgoyal 08-17-2007, 10:10 AM No that renders with one space. And obviously assume there is some style information associated with the tag, since that's the case I have to worry about. Excalibur 08-17-2007, 10:20 AM Well, I tried it in Firefox and it renders with two spaces. IE is wrong when it comes to rendering HTML properly. Anyway, I tried it in IE and it does incorrectly render with 1 space. So, which do you go with? Standards compliant Firefox or non-standards compliant IE. Either way, you could just as easily remove any whitespace at the beginning of a non-pre tag and that will fix said issue. If the tag has CSS that says it is white-space:pre; then you can check that before removing the space at the beginning of a tag and determine if you need to remove it or not. hmm I forgot what started this conversation anyway... heh kovidgoyal 08-17-2007, 10:27 AM firefox and konqueror in linux render it with one space. I suspect firefox on windows is broken. Certainly redering it with one space is logically correct. No you cant blindly remove leading space, consider the following situation <p>One,<span> two Just need to remember the last string added, shouldn't be too hard to do. Excalibur 08-17-2007, 10:51 AM Well, it "looks" like one space but it is 2 spaces in source... I guess it does collapse it in the browser window then... True enough. kovidgoyal 08-17-2007, 04:43 PM Released v0.3.98 with support for white-space. This required a complete rewrite of html2lrf's whitespace handling mechanism, so keep an eye out for whitespace related problems. The new code is actually simpler than the old, but not tested nearly as much ;-) Also fixed the lit2lrf, any2lrf bugs on windows. Pode 08-17-2007, 06:11 PM kovidgoyal>here's an attached .lrf, with an example of line-centered text. It's an article from the New York Time, taken as an html page, and converted with html2lrf. If you go on the last page, you can see a list of items, centered. If I understood well, what you want (and others on the forum too) is centered text on the reader. Centered text like that pops in nearly all my lrf files, because a lot, if not all of them comes from saved web pages, poorly stripped of their screen-waste content. So I suppose that when I get centered text, it's because, even if I cut "with an axe" through the html code, css definition still (badly) applies to what's left... kovidgoyal 08-17-2007, 06:18 PM If you go on the last page, you can see a list of items, centered. That's horizontal centering. The O.P. was asking about vertical centering. irishjew 08-17-2007, 06:20 PM the <a href> tag is inside a <pre> tag. Delete the <pre> tag and you'll be fine. Thanks a bunch, Kovid. I did that, and it fixed the links, but unfortunately now all my line breaks and tabs are gone. Do you know what might cause that? kovidgoyal 08-17-2007, 06:25 PM upgrade to the latest version and put the <pre> tags back and you should have both links and line breaks. irishjew 08-17-2007, 07:45 PM upgrade to the latest version and put the <pre> tags back and you should have both links and line breaks. I upgraded, and the links are there, but they don't point to the right places. Instead, they all go back to the beginning of the file. kovidgoyal 08-18-2007, 01:59 AM I upgraded, and the links are there, but they don't point to the right places. Instead, they all go back to the beginning of the file. Hmm that shouldn't be happening please send me the file. Pode 08-18-2007, 06:01 AM kovidgoyal>OK, it seems I misunderstood. Excuse-me for any false hope. JSWolf 08-18-2007, 09:05 AM So any idea when the next version is going to be released? The one that fixes html2lrf. kovidgoyal 08-18-2007, 11:12 AM Later today. angelyne 08-18-2007, 12:58 PM I'm having a small problem. I have a sentence that is coded like this : <p class="first">G<em>ram called at five in the morning. She never remembered</em> which looks like this Gram called at five in the morning. She never remembered the time difference. I don't like the fact that the first letter is not italics. It looks funny. I also wanted the first letter to be a drop cap. So I changed the code to this : <p class="first"><em><big class='libprs500_dropcaps'>G</big>ram called at five in the morning. She never remembered</em> The result looks fine in a browser, but doesn't convert well. The result is the first letter is in italics and drop-capped, but the rest of the sentence looses the italics. It looks like this G ram called at five in the morning. kovidgoyal 08-18-2007, 01:03 PM I tried that. For me, the the entire statement is italicized, as would be expected from the HTML <em><big class='libprs500_dropcaps'>G</big>ram called at five in the morning. She never remembered</em> angelyne 08-18-2007, 10:20 PM I made it work. Thank you so much for your help. And thank you making this great tool. Without dedicated people like you, we would have never been able to break out of the Sony Connect prison ... tsgreer 08-18-2007, 10:31 PM Holy crap! We are at version 3.99!! Does that mean the 4.0 GUI version is almost here?! :) kovidgoyal 08-18-2007, 10:34 PM Yeah there's basically only one thing left i have to implement. angelyne 08-19-2007, 02:12 PM May I bug you with yet another question. Is there a way you could define a style that would take the first letter of a paragraph and make it a drop cap. I've seen a couple of similar style that does this but it's not compatible with your converter here is one: p.drop { text-indent: 0em; margin: 0em; } p.drop:first-letter { font-size : 165%; font-weight : bold; width : .50em; } I've been using your <p><big class='libprs500_dropcaps'></big> but I don't know how to turn that into a style. And one last question ... what would be the command to make text smaller (that is accepted by html2lrf). kovidgoyal 08-19-2007, 02:14 PM May I bug you with yet another question. Is there a way you could define a style that would take the first letter of a paragraph and make it a drop cap. I've seen a couple of similar style that does this but it's not compatible with your converter here is one: p.drop { text-indent: 0em; margin: 0em; } p.drop:first-letter { font-size : 165%; font-weight : bold; width : .50em; } Can you open a feature request at libprs500.kovidgoyal.net and if I get the time I'll do it. angelyne 08-19-2007, 04:01 PM Have done so. Thank you. It's ticket 167 JSWolf 08-20-2007, 12:02 AM Yeah there's basically only one thing left i have to implement. Make that TWO things that need to be implemented. I just created ticket #168 because I have a LIT file that HTML2LRF hangs on big time. I've attached the LIT file also. And I did try version 0.3.101. kovidgoyal 08-20-2007, 12:26 AM It doesn't hang, it just takes a long time...about 20mins on my 3GHZ machine. That's because newer version of html2lrf optimize memory usage at the expense of running time, and that lit file has some seriously messy HTML. JSWolf 08-20-2007, 01:12 AM hhhmm... Ok, I'll try again and this time leave it running. Thanks! LaughingVulcan 08-22-2007, 07:05 PM Released v0.3.97 with support for the padding CCS attribute. This makes LaughingVulcan's hanging indents possible. See the new demo in the first post. It'll reach the servers in ~30min. Thank you, thank you, thank you! It works VERY nicely!! :) kovidgoyal 08-22-2007, 07:12 PM Thank you, thank you, thank you! It works VERY nicely!! :) Anything for a fellow poetry enthusiast :-) volwrath 08-23-2007, 02:50 PM Anything for a fellow poetry enthusiast :-) Quick question. Did you fix the problem where you have to install the USB drivers to get libprs500 to work? kovidgoyal 08-23-2007, 02:58 PM I think I did, though I haven't tested it. volwrath 08-23-2007, 03:26 PM I think I did, though I haven't tested it. Hehe great! I will test it for you tonite :) beartard 08-23-2007, 04:33 PM I haven't used html2lrf in a while. I'm trying to help my friends at almudi.org to convert the breviary in Latin to BBeB format since they graciously loaned me the source files. I have a folder of html files (all linked together) to convert and I'm getting the following error back from the latest version as of this post: [INFO] convert_from.py:187: Processing completorium.html Parsing HTML... [INFO] convert_from.py:205: Converting to BBeB... Traceback (most recent call last): File "/usr/bin/html2lrf", line 8, in <module> load_entry_point('libprs500==0.3.103', 'console_scripts', 'html2lrf')() File "build/bdist.linux-i686/egg/libprs500/ebooks/lrf/html/convert_from.py", line 1576, in main File "build/bdist.linux-i686/egg/libprs500/ebooks/lrf/html/convert_from.py", line 1493, in process_file File "build/bdist.linux-i686/egg/libprs500/ebooks/lrf/html/convert_from.py", line 177, in __init__ File "build/bdist.linux-i686/egg/libprs500/ebooks/lrf/html/convert_from.py", line 219, in start_on_file File "build/bdist.linux-i686/egg/libprs500/ebooks/lrf/html/convert_from.py", line 322, in parse_file File "build/bdist.linux-i686/egg/libprs500/ebooks/lrf/html/convert_from.py", line 528, in process_children File "build/bdist.linux-i686/egg/libprs500/ebooks/lrf/html/convert_from.py", line 1376, in parse_tag File "build/bdist.linux-i686/egg/libprs500/ebooks/lrf/html/convert_from.py", line 528, in process_children File "build/bdist.linux-i686/egg/libprs500/ebooks/lrf/html/convert_from.py", line 1376, in parse_tag File "build/bdist.linux-i686/egg/libprs500/ebooks/lrf/html/convert_from.py", line 528, in process_children File "build/bdist.linux-i686/egg/libprs500/ebooks/lrf/html/convert_from.py", line 1209, in parse_tag UnboundLocalError: local variable 'pcss' referenced before assignment This is from the command (where completorium.html is the index file): html2lrf -t "Liturgia Horarum: ad Completorium" -a "Ecclesia Catholica" --publisher="almudi.org" --header --headerformat=%t --link-levels=1 --disable-chapter-detection --verbose ./completorium.html kovidgoyal 08-23-2007, 05:18 PM Sigh another typo. I'll release a new version soon. beartard 08-23-2007, 05:26 PM Sigh another typo. I'll release a new version soon. As long as it's *your* typo, I'm not complaining a bit :wink: volwrath 08-23-2007, 09:06 PM I think I did, though I haven't tested it. Works well! Thanks for a great piece of software! beowulf573 08-29-2007, 10:47 AM A couple quick questions: 1) Is it possible to do footnotes using html2lrf? What's the best way to handle them? 2) Has anyone tried to come up with a style to mimic a title page? What's the best way to do vertical spacing that looks good at different font sizes. thanks, Eddie beowulf573 08-31-2007, 08:30 AM Well, I got links working in a footnote type style, but I have a formatting problem. I can make an internal hyperlink that isn't superscript, or superscript that isn't an internal hyperlink. But I can't make a text block that's both. I've tried both the <sup> tag and the css style: text-align: supercript; Any suggestions? kovidgoyal 08-31-2007, 03:31 PM yeah html2lrf doesn't support links in sup/sub for technical reasons. beowulf573 08-31-2007, 04:03 PM Ah, ok, thanks. I used lrf2lrs to try and figure out what was going on and figured it was just a limitation of the format. beartard 09-08-2007, 12:02 PM I have an interesting issue with html2lrf. I can't seem to put a meta-tag in the lrf file that has exclamation points at the end. For example when I use -t "This book sucks!" I get a response that bash event: !" not found. Am I doing something wrong? kovidgoyal 09-08-2007, 12:11 PM Use -t 'title!' i.e. single quotes beartard 09-08-2007, 09:34 PM I knew I was being stupid somehow ;-) silkcom 09-08-2007, 11:18 PM Is there a way to go from PDF to html in such a way that I can use the HTML converter that won't thrash the formatting and images? Generally with PDF converting I see that either the whole thing is made into images (large file size), or the formatting is completely gone. I'm very interested in finding a way to keep "code" fonts and images inside the code, but get larger font sizes than PDF's give on the 6 inch screen. I like to read programming books (which always have images and code examples). kovidgoyal 09-08-2007, 11:28 PM Nope there is no way to do that. PDF is a partially rasterized format. As far as I know it isn't possible to convert PDFs with high fidelity. Rasterization is the only way to go for complex PDFs. angelyne 09-15-2007, 01:09 PM I find that inserting blank lines in the form of <br> are ignored by the converter. What code, if any, could I use to actually create a blank line. lj69 09-15-2007, 01:16 PM can anyone can tell me how to do this, i try drag and drop the file from libpsr500 to desktop but got the message error can not copy from disk or soure or something pls any one write tut with pics, step by step about this , i'm a newbie with this kovidgoyal 09-15-2007, 01:21 PM I find that inserting blank lines in the form of <br> are ignored by the converter. What code, if any, could I use to actually create a blank line. Interesting, that shouldn't happen can you post a code snippet that demonstrates this. You can use empty paragraphs <p></p> EDIT: Also note that a single <br> causes a line break, not a blank line. For a blank line you need <br><br> angelyne 09-15-2007, 07:38 PM I did a few test and and got mixed results. However, the reason I wasn't achieving the result I wanted is that blank spaces will be ignored before the text but will be used after the txt start. Here is the code <body> <p> </p> <p> </p> <p> </p> <p>Title</p> <p>Author</p> <p> </p> <p> </p> <p> </p> <p> </p> <p>text (with blank lines)</p> </body> Giving something like this <top of page> Title Author Text angelyne 09-15-2007, 07:46 PM Here is another question. I noticed that in a book I bought from Sony Connect : (Anansi Boys), the table of content (in the table of contents section) displays 4 items. Copyright, TOC, About, Begin. However the TOC that is part of the text itself (which are simply html links) is quite elaborate, with individual chapters listed. I find that I like this. Is there a way we can code HTML so that we can pick and choose which elements html2lrf will include in the Table of Contents section of the book? JSWolf 09-16-2007, 08:03 PM Here is another question. I noticed that in a book I bought from Sony Connect : (Anansi Boys), the table of content (in the table of contents section) displays 4 items. Copyright, TOC, About, Begin. However the TOC that is part of the text itself (which are simply html links) is quite elaborate, with individual chapters listed. I find that I like this. Is there a way we can code HTML so that we can pick and choose which elements html2lrf will include in the Table of Contents section of the book? I know you can do it in Book Designer. But I've never seen or read about a way to do this with html2lrf kovidgoyal 09-16-2007, 08:10 PM I know you can do it in Book Designer. But I've never seen or read about a way to do this with html2lrf Split the HTML file into a toc file and a main file. ut the links you want to go into the TOC in the toc file and the rest in the main file. Run html2lrf on the toc file. JSWolf 09-16-2007, 08:15 PM Split the HTML file into a toc file and a main file. ut the links you want to go into the TOC in the toc file and the rest in the main file. Run html2lrf on the toc file. Which ToC will show up in the Sony ToC (aka the menu) and which will only show up in the book? kovidgoyal 09-16-2007, 08:19 PM toc -> TOC main -> file kovidgoyal 09-17-2007, 11:26 AM I did a few test and and got mixed results. However, the reason I wasn't achieving the result I wanted is that blank spaces will be ignored before the text but will be used after the txt start. Here is the code <body> <p> </p> <p> </p> <p> </p> <p>Title</p> <p>Author</p> <p> </p> <p> </p> <p> </p> <p> </p> <p>text (with blank lines)</p> </body> Giving something like this <top of page> Title Author Text This will be fixed sometime after 0.4.0. silkcom 09-19-2007, 04:08 PM So I just got a piece of software called Solid Converter PDF (sorry it's a windows product, and it's proprietary). However, it's PDF to MS Word conversion is simply amazon. It brings over everything (and I mean everything). The only problem that I've seen, or the only thing that isn't exact is that the bullets used are squares in the PDF and dashes in the word doc. (That's good enough). This means that it is possible to do a PDF to HTML that is solid, keeps all the formatting, and then can be formatted down to LRF to look "just like" the original. kovidgoyal 09-19-2007, 04:18 PM Is the word doc reflowable? export it to html and see what happens when you resize the window it is viewed in. silkcom 09-19-2007, 04:39 PM I can highlight text and all that. I converted it to RTF, asside from loosing some text background colors (which a somewhat important but I can live without them), the conversion was definately amazing, and totally readable. Doc's html converter isn't the best, it lost a bunch of the formatting, but the doc to rtf worked great. Fallen angel 09-23-2007, 11:59 AM I'm not sure if this is the right place to say that, but I cannot have the libprs500 to work. I'm having an error message. That's what is written in the log file: Traceback (most recent call last): File "main.py", line 696, in <module> File "main.py", line 688, in main File "main.py", line 132, in __init__ File "libprs500\gui2\library.pyo", line 367, in set_database File "libprs500\gui2\library.pyo", line 102, in set_database UnicodeEncodeError: 'ascii' codec can't encode characters in position 26-32: ordinal not in range(128) kovidgoyal 09-23-2007, 12:21 PM Will be fixed in the next version. balok 09-24-2007, 09:21 AM Hi kovidgoyal, Even with --bottom-margin=0 (which is the default, incidentally), there still seems to be quite a gap at the bottom, which doesn't happen for left/top/right. Is this normal, or am I missing something? Btw, thanks for this great program. kovidgoyal 09-24-2007, 10:44 AM That's a safety feature to catch any "overflow" from formatting errors. balok 09-24-2007, 11:31 AM That's a safety feature to catch any "overflow" from formatting errors. Is there a way to turn off this feature? kovidgoyal 09-24-2007, 12:18 PM In the next release there will be a prs500-unsafe profile. balok 09-24-2007, 07:55 PM In the next release there will be a prs500-unsafe profile. Thanks. I assume the safety feature also accounts for the few pixels left on the side margins, and the small space on the top margin also? Will the prs500-unsafe profile cover these too? kovidgoyal 09-24-2007, 08:10 PM No the profile doesn't set the top and side margins. There are non-zero defaults for them, setting them to zero is the best that can be done. balok 09-29-2007, 07:18 AM Hi Kovid, I haven't noticed any difference when using the prs500-unsafe profile. I've made two lrf files, one with the prs500 profile and the other with the unsafe profile, both are exactly the same. Fallen angel 09-30-2007, 05:18 AM The new version (0.4.5) gives me the same error. kovidgoyal 09-30-2007, 12:33 PM Oops looks like I forgot to fix that error. Sorry, you'll have to wait for the next version. kovidgoyal 09-30-2007, 12:47 PM Hi Kovid, I haven't noticed any difference when using the prs500-unsafe profile. I've made two lrf files, one with the prs500 profile and the other with the unsafe profile, both are exactly the same. Hmm 'fraid I dont know what else to do that seems to be the best that can be done. Fallen angel 10-02-2007, 06:30 AM I don't know what's wrong with my pc, but v.0.4.7 doesn't work either. :smack: kovidgoyal 10-02-2007, 11:43 AM Since I did make some changes can you repost the error message. Fallen angel 10-03-2007, 07:40 AM Traceback (most recent call last): File "main.py", line 723, in <module> File "main.py", line 716, in main File "main.py", line 137, in __init__ File "libprs500\gui2\library.pyo", line 383, in set_database File "libprs500\gui2\library.pyo", line 118, in set_database File "libprs500\library\database.pyo", line 624, in __init__ File "libprs500\library\database.pyo", line 39, in _connect UnicodeEncodeError: 'ascii' codec can't encode characters in position 26-32: ordinal not in range(128) aaronvegh 10-03-2007, 09:44 AM Hi there, I'm attaching a copy of Apple's Cocoa Fundamentals development guide, a book in HTML format (a PDF is available as well, but I'd prefer to get this one working on my PRS500). I've run other files through html2lrf without problems, but this set is causing all kinds of problems! html2lrf /Developer/ADC\ Reference\ Library/documentation/Cocoa/Conceptual/CocoaFundamentalIntroduction/chapter_1_section_1.html Processing chapter_1_section_1.html Parsing HTML... Converting to BBeB... Processing index.html Parsing HTML... Converting to BBeB... An error occurred while processing a table: list index out of range. Ignoring table markup. Processing index-date.html Parsing HTML... Converting to BBeB... An error occurred while processing a table: list index out of range. Ignoring table markup. An error occurred while processing a table: Table has cell that is too large. Ignoring table markup. An error occurred while processing a table: Table has cell that is too large. Ignoring table markup. Processing index-date0.html Parsing HTML... Converting to BBeB... An error occurred while processing a table: list index out of range. Ignoring table markup. ^CTraceback (most recent call last): File "/Applications/libprs500.app/Contents/Resources/html2lrf.py", line 9, in <module> main() File "libprs500/ebooks/lrf/html/convert_from.pyo", line 1590, in main File "libprs500/ebooks/lrf/html/convert_from.pyo", line 1507, in process_file File "libprs500/ebooks/lrf/html/convert_from.pyo", line 179, in __init__ File "libprs500/ebooks/lrf/html/convert_from.pyo", line 231, in start_on_file File "libprs500/ebooks/lrf/html/convert_from.pyo", line 469, in process_links File "libprs500/ebooks/lrf/html/convert_from.pyo", line 231, in start_on_file File "libprs500/ebooks/lrf/html/convert_from.pyo", line 469, in process_links File "libprs500/ebooks/lrf/html/convert_from.pyo", line 231, in start_on_file File "libprs500/ebooks/lrf/html/convert_from.pyo", line 469, in process_links File "libprs500/ebooks/lrf/html/convert_from.pyo", line 229, in start_on_file File "libprs500/ebooks/lrf/html/convert_from.pyo", line 332, in parse_file File "libprs500/ebooks/lrf/html/convert_from.pyo", line 538, in process_children File "libprs500/ebooks/lrf/html/convert_from.pyo", line 1390, in parse_tag File "libprs500/ebooks/lrf/html/convert_from.pyo", line 538, in process_children File "libprs500/ebooks/lrf/html/convert_from.pyo", line 1390, in parse_tag File "libprs500/ebooks/lrf/html/convert_from.pyo", line 538, in process_children File "libprs500/ebooks/lrf/html/convert_from.pyo", line 1386, in parse_tag File "libprs500/ebooks/lrf/html/convert_from.pyo", line 538, in process_children File "libprs500/ebooks/lrf/html/convert_from.pyo", line 1371, in parse_tag File "libprs500/ebooks/lrf/html/convert_from.pyo", line 538, in process_children File "libprs500/ebooks/lrf/html/convert_from.pyo", line 1376, in parse_tag File "libprs500/ebooks/lrf/html/convert_from.pyo", line 538, in process_children File "libprs500/ebooks/lrf/html/convert_from.pyo", line 1380, in parse_tag File "libprs500/ebooks/lrf/html/convert_from.pyo", line 1401, in process_table File "libprs500/ebooks/lrf/html/table.pyo", line 371, in blocks File "libprs500/ebooks/lrf/html/table.pyo", line 326, in get_widths File "libprs500/ebooks/lrf/html/table.pyo", line 267, in preferred_width File "libprs500/ebooks/lrf/html/table.pyo", line 212, in preferred_width File "libprs500/ebooks/lrf/html/table.pyo", line 209, in text_block_preferred_width File "libprs500/ebooks/lrf/html/table.pyo", line 163, in text_block_size KeyboardInterrupt The process seems to hang after spitting out a few error messages, and then throws these errors after I do an interrupt (Ctrl-C). Any suggestions on how to proceed? Thanks, Aaron. kovidgoyal 10-03-2007, 11:37 AM Traceback (most recent call last): File "main.py", line 723, in <module> File "main.py", line 716, in main File "main.py", line 137, in __init__ File "libprs500\gui2\library.pyo", line 383, in set_database File "libprs500\gui2\library.pyo", line 118, in set_database File "libprs500\library\database.pyo", line 624, in __init__ File "libprs500\library\database.pyo", line 39, in _connect UnicodeEncodeError: 'ascii' codec can't encode characters in position 26-32: ordinal not in range(128) Hmm ok I'm trying another fix in the next version. kovidgoyal 10-03-2007, 11:42 AM Hi there, I'm attaching a copy of Apple's Cocoa Fundamentals development guide, a book in HTML format (a PDF is available as well, but I'd prefer to get this one working on my PRS500). I've run other files through html2lrf without problems, but this set is causing all kinds of problems! The process seems to hang after spitting out a few error messages, and then throws these errors after I do an interrupt (Ctrl-C). Any suggestions on how to proceed? Thanks, Aaron. The file it seems to be hanging on index-date0.html was not in the attached zip. aaronvegh 10-03-2007, 03:15 PM The file it seems to be hanging on index-date0.html was not in the attached zip. Yeah, I noticed that! I don't know where that file is coming from. I started with the original index.html, and I guess your tool goes through parsing for <a> tags? There's no index-date00.html referenced in the file. That's why I included the complete file set, hoping you could replicate the problem. kovidgoyal 10-03-2007, 03:32 PM unzip that zip file into some non-standard location (like your home directory) and then run html2lrf on toc.html. That worked for me. aaronvegh 10-03-2007, 04:22 PM Oh, that's it! Awesome! Great tool, thanks for your help! Fallen angel 10-04-2007, 07:13 AM Hmm ok I'm trying another fix in the next version. Thank you very much! maggotb0y 10-04-2007, 01:35 PM Hey, I've been using Book Designer for a while to create Sony Reader Content, and I think it's a nice program, but it looks like I'll get better results with html2lrf for inline graphics and tables, etc. I'd like to be able to convert my HTML0 files output by BD using libprs500, and I'm having a few problems. I've looked around and was a bit surprised that this doesn't seem to have come up on the forum. Here are my questions. I can't figure out a chapter-regex that will pick up BD style chapter tags (I've included an example below). Can anyone help me with that? <SPAN id=title><DIV align=center><B><FONT color=#001950>PROLOGUE</FONT></B></DIV> </SPAN> Another issue is trying to get page breaks to work the way I want. BD replaces the <HR> tag with page breaks, and I use that to control pagination (BD by default also does chapter page breaks, but I turn that off so that I can have more control over the output). With libprs500, I can use page-break-before-tag=HR and that controls the page break fine, but it displays the Horizontal Rule, which I don't want. Is there any workaround that would create the manual page break, but not include the visible line? Finally (for now at least), BD creates empty lines as <DIV align=justify> </DIV> and HTML2LRF seems to not respect that empty line- the line below is pushed up. Is there any way I can force this to show as an empty line. kovidgoyal 10-04-2007, 01:45 PM Hmm none of those issues have easy solutions. What I will do is make a preprocessing option for BD that will automatically replace the problematic html with HTML that html2lrf processes. Can you send me a couple of example HTML0 files. Also you may try saving the HTML0 files as HTML in BD and then running html2lrf over them. maggotb0y 10-04-2007, 03:10 PM Hmm none of those issues have easy solutions. What I will do is make a preprocessing option for BD that will automatically replace the problematic html with HTML that html2lrf processes. Can you send me a couple of example HTML0 files. Also you may try saving the HTML0 files as HTML in BD and then running html2lrf over them. A pre-processing engine for BD would probably make quite a bit of sense. The only difference between the "html0" file and the "save as HTML" options in BD is the name of the extension <g>. I've attached a zip file with an HTML file generated from BD that will give you the basics. I'm sure it is pretty self explanatory, but if you have any questions, let me know. If you need some bigger files for a more complete test, let me know and I will see if I have anything public domain with some good formatting to post. kovidgoyal 10-04-2007, 07:30 PM Hmm tell me if the following mapping is correct <span id=title> --> <h1> <span id=subtitle> --> <h2> What about lower levels of headings? subsubtitle, etc? That way you can match chapters on either h1 or h2. <hr> --> <span style="page-break-after:always" /> I will probably modify html2lrf's handling of the div tag to take care of the blank lines. Incidentally, the way BD uses the id attribute is puzzling. According to the HTML spec ids should be unique, is there some reason vvv chose to use the id attribute rather than the class attribute, which is a much more natural fit? maggotb0y 10-04-2007, 09:22 PM Hmm tell me if the following mapping is correct <span id=title> --> <h1> <span id=subtitle> --> <h2> What about lower levels of headings? subsubtitle, etc? That way you can match chapters on either h1 or h2. Looks almost right- book designer always creates a chapter for <span id=title> and never creates one for <span id=subtitle>. I use this to create a title for (e.g. "A funny thing happened today") but not for the subtitle ("chapter one") so that I have control over what appears in the table of contents. I doubt you would have to do much special handling for span id=subtitle, they will probably take care of themselves just fine. Incidentally, the way BD uses the id attribute is puzzling. According to the HTML spec ids should be unique, is there some reason vvv chose to use the id attribute rather than the class attribute, which is a much more natural fit? Not really sure why BD is the way it is <g>. It's a nice enough tool for creating ebooks, but it has it's quirks, and the pace of development and featureset of libprs500 seems to far outstrip BD (which is why I would like to make the switch). Thanks a lot for your work on this, I'm sure you'll earn quite a few converts. Let me know when you have something ready for testing- I'm excited to start running my collection through this! JSWolf 10-04-2007, 10:17 PM If anyone wants to donate to get kividgoyal a new 505 so he can port libprs500 to it then please have a look at http://www.mobileread.com/forums/showthread.php?t=14496 and give give give. glenn69 10-04-2007, 11:16 PM I've been trying for a couple weeks to get a website page to lrf format using libprs500 in Linux (ubuntu). The page is simply a text handbook, so I thought it would be simple. Well it hasn't really worked for me yet. i've tried saving html converting to pdf then pdf2lrf, no go. Tried html2lrf, again no good. Could somebody convert show me how to convert this page : http://www.gentoo.org/doc/en/handbook/handbook-x86.xml?style=printable&full=1#book_part1 Then let me know how you did it. Thanks, One Confused Reader kovidgoyal 10-05-2007, 07:17 AM When you say no good, what do you mean exactly? The simplest way to convert an online website into an LRF file is web2lrf --url "http://www.gentoo.org/doc/en/handboo...l=1#book_part1" EDIT: In your case the correct commandline is web2lrf --url "http://www.gentoo.org/doc/en/handbook/handbook-x86.xml?style=printable&full=1#book_part1" -r 0 glenn69 10-05-2007, 04:10 PM I'm sorry for the lack of detail with "no good," but I've tried so many times that I've forgotten exact problems. I ran the command exactly as you suggested web2lrf --url "http://www.gentoo.org/doc/en/handbook/handbook-x86.xml?style=printable&full=1#book_part1" -r 0 When I try to view in Sony reader, upon opening the file, it either hangs on "Formatting..." or it reboots the unit. So,I tried to use lrfviewer to view the created lrf. Below are my errors upon viewing: glenn@glenn-desktop:~$ lrfviewer /home/glenn/GentooDocNoTable.lrf glenn@glenn-desktop:~$ Layout time: 8.56640505791 seconds Traceback (most recent call last): File "build/bdist.linux-i686/egg/libprs500/gui2/lrf_renderer/main.py", line 180, in go_to_page File "build/bdist.linux-i686/egg/libprs500/gui2/lrf_renderer/document.py", line 509, in show_page TypeError: argument 0 of signal QGraphicsScene.page_changed(PyQt_PyObject) has an invalid type Traceback (most recent call last): File "build/bdist.linux-i686/egg/libprs500/gui2/lrf_renderer/main.py", line 180, in go_to_page File "build/bdist.linux-i686/egg/libprs500/gui2/lrf_renderer/document.py", line 509, in show_page TypeError: argument 0 of signal QGraphicsScene.page_changed(PyQt_PyObject) has an invalid type Traceback (most recent call last): File "build/bdist.linux-i686/egg/libprs500/gui2/lrf_renderer/main.py", line 180, in go_to_page File "build/bdist.linux-i686/egg/libprs500/gui2/lrf_renderer/document.py", line 509, in show_page TypeError: argument 0 of signal QGraphicsScene.page_changed(PyQt_PyObject) has an invalid type Thanks for help and all your efforts on this great project. kovidgoyal 10-05-2007, 07:32 PM This is because the PyQt on your distribution is not correctly installed. The LRF file should work fine if you copy it to the SONY reader or view it using the Connect software. kovidgoyal 10-05-2007, 10:32 PM Looks almost right- book designer always creates a chapter for <span id=title> and never creates one for <span id=subtitle>. I use this to create a title for (e.g. "A funny thing happened today") but not for the subtitle ("chapter one") so that I have control over what appears in the table of contents. I doubt you would have to do much special handling for span id=subtitle, they will probably take care of themselves just fine. Not really sure why BD is the way it is <g>. It's a nice enough tool for creating ebooks, but it has it's quirks, and the pace of development and featureset of libprs500 seems to far outstrip BD (which is why I would like to make the switch). Thanks a lot for your work on this, I'm sure you'll earn quite a few converts. Let me know when you have something ready for testing- I'm excited to start running my collection through this! Added initial support for pre-processing via the --book-designer option in svn. glenn69 10-05-2007, 11:03 PM According to Synaptic I have the following pyqt's installed: Python-qt3 Python-qt4 Python-qt4-dev Python-qt4-gl Python-qt4-sql Is there something else I need ? kovidgoyal 10-05-2007, 11:12 PM It's a version problem. You should either install SPI and PyQt by hand or wait for gutsy. JSWolf 10-05-2007, 11:35 PM Kewl! So it'll process uncompressed Book Designer files. Nice one! Just have to wait for the Windows version to be released. danx 10-06-2007, 02:17 AM Kovid, You're superman--what you did is amazing. I had problems upgrading Python to 2.5. I have 2.4 and I don't want to upgrade the Linux distribution now. It was a pain to upgrade and I posted a comment here on the other software besides Python that is needed by html2lrf. See libprs500 bug #117 for details (https://libprs500.kovidgoyal.net/ticket/117). One question--I believe Sony LRF supports images, but unless I'm doing something wrong, html2lrf doesn't seem to include images. If my assumptions are true, are there any plans to include images in the generated LRF format? I viewed my generated LRF file and it had no images. I tried both relative and full paths for the image file. For example: Relative: <img src="images/foo.jpg" /> Full path: <img src="http://www.foobar.com/images/foo.jpg" /> I donated to the Kovid PRS 505 fund--great work! (http://kovidgoyal.chipin.com/kovidgoyal-to-port-libprs500-to-the-prs505) maggotb0y 10-06-2007, 07:16 AM Added initial support for pre-processing via the --book-designer option in svn. Okay, you've got me downloading your VMWare image now so I can try to play around with this- I'm sure I'm jumping in over my head, but I can't help it. Thanks! kovidgoyal 10-06-2007, 08:35 AM Kovid, You're superman--what you did is amazing. I had problems upgrading Python to 2.5. I have 2.4 and I don't want to upgrade the Linux distribution now. It was a pain to upgrade and I posted a comment here on the other software besides Python that is needed by html2lrf. See libprs500 bug #117 for details (https://libprs500.kovidgoyal.net/ticket/117). One question--I believe Sony LRF supports images, but unless I'm doing something wrong, html2lrf doesn't seem to include images. If my assumptions are true, are there any plans to include images in the generated LRF format? I viewed my generated LRF file and it had no images. I tried both relative and full paths for the image file. For example: Relative: <img src="images/foo.jpg" /> Full path: <img src="http://www.foobar.com/images/foo.jpg" /> I donated to the Kovid PRS 505 fund--great work! (http://kovidgoyal.chipin.com/kovidgoyal-to-port-libprs500-to-the-prs505) html2lrf does support images. See for example the demo lrf file attached in the first post of this thread. Probably there is something missing in your installation of the python imaging library. Try running html2lrf with the --verbose flag. Thanks for the donation, its much appreciated. kovidgoyal 10-06-2007, 08:35 AM Okay, you've got me downloading your VMWare image now so I can try to play around with this- I'm sure I'm jumping in over my head, but I can't help it. Thanks! Not a problem. If you have trouble using the vmware image let me know. danx 10-06-2007, 03:58 PM html2lrf does support images. See for example the demo lrf file attached in the first post of this thread. Probably there is something missing in your installation of the python imaging library. Try running html2lrf with the --verbose flag. I think I know the problem--the images are inside tables (for positioning and captions) and it appears the table contents are ignored because it's "too large" (an image): html2lrf --verbose -o matthes.lrf -t "Francois Matthes and the Marks of Time" -a "Francois E. Matthes" matthes.html Processing matthes.html Parsing HTML... Converting to BBeB... An error occurred while processing a table: Table has cell that is too large. Ignoring table markup. Output written to /local/apache/htdocs/yosemite/library/matthes/matthes.lrf html2lrf --version libprs500 0.4.7 Here's a typical table with an image: <table border="0" align="right"><tr> <td align="center"> <a href="images/dust_jacket.jpg"> <img src="images/thumbnail/dust_jacket.jpg" alt="François Matthes and the Marks of Time, dust jacket" width="576" height="411" border="0" /></a> <br /> <a href="images/dust_jacket.jpg"> <span class="small"><i>Dust jacket</i></span></a> </td> </tr></table> kovidgoyal 10-06-2007, 04:13 PM Just delete the <a href="dust_jacket.jpeg"> line. There's no way for html2lrf to know what size the image referenced in an <a> element should be. chatlumo 10-06-2007, 10:15 PM About images (not included), this doesn't work with the GUI under Mac OS X but it's working if i use the command line. Do you know why ? Thanks. kovidgoyal 10-06-2007, 10:34 PM You have to zip up the HTML file and all its images and then add it to the database. Only then will conversion from the GUI include the images. glenn69 10-06-2007, 11:57 PM It's a version problem. You should either install SPI and PyQt by hand or wait for gutsy. OK, I now have Python 2.5 and the html referenced by web2lrf --url "http://www.gentoo.org/doc/en/handbook/handbook-x86.xml?style=printable&full=1#book_part1" -r 0 seems to create an lrf without the errors I used to get. I have copied it to the reader, but when I choose the book to read the reader stays on "Formatting..." screen for about 5 minutes then when I read the book page changes take about 12 second each. Are these simply side effects of the translation from HTML or should I be looking into html2lrf options? Thanks again P.S. I did donate for your prs505, but didn't get the cool announcement line in my post. I hope you got it. Let me know if it didn't go through. kovidgoyal 10-07-2007, 12:01 AM You should probably try some of the chapter options, to force more frequent page breaks. tidixon 10-07-2007, 05:10 PM Hey, I am new to this forum and to all these tools. I installed libprs500 and tried to use web2lrf to get the economist. It took about 30 minutes and produced a 23MB lrf that could not be read on the reader. This was on linux. I tried it again on Windows with the same result. I can post error logs and stuff, but I think maybe they just haven't finished writing the profile? It is not listed in the graphical interface but is in the command line. Thanks, TIm. kovidgoyal 10-07-2007, 05:33 PM yeah economist isn't done. maggotb0y 10-07-2007, 09:18 PM Not a problem. If you have trouble using the vmware image let me know. Okay, I have trouble <g>. I got the VM image up and running and networked, which took a little bit of fooling around as I am not terribly familiar with gentoo. Now I try to update from SVN and when running python setup.py develop as root I get this error: Trying to setup udev rules... Setting up bash completion... failed Traceback (most recent call last): File "/home/kovid/work/libprs500/src/libprs500/linux.py", line 79 in setup_completion from libprs500.gui2.lrf_renderer.main import option_parser as lrfviewerop File "/home/kovid/work/libprs500/src/libprs500/gui2/lrf_renderer/main.py", line 27 in <module> from libprs500.gui.lrf_renderer.main_us import Ui_mainWindow ImportError: No module named main_ui Setting up desktop integration QPixmap::scaled: Pixmap is a null pixmap /bin/sh: xdg-icon-resource: command not found You do not have the Portland Desktop Utilities installed, skipping installation of desktop integration Is this something I can safely ignore? Is the VMware image out of alignment with the current requirements? Thanks for your help, Christopher kovidgoyal 10-07-2007, 09:23 PM You need to run make make -C src/libprs500/gui2/ maggotb0y 10-08-2007, 12:15 PM You need to run make make -C src/libprs500/gui2/ As always, you know your stuff. I built the new SVN under the VM image and tested converting some documents and they're not quite right (quite possible I don't know the appropriate switches- the only one i used is --book-designer). Here's what I've noticed on the first round: <SPAN ID=title> should become a chapter title, the chapter isn't being generated images aren't included <SPAN id=subtitle>should be in the document, but it isn't So I've created another test document for you (this time with images) and included the document the LRF that book designer generates for your comparison. kovidgoyal 10-08-2007, 02:04 PM Bug in the BD regexp fixed in svn maggotb0y 10-08-2007, 09:46 PM Kovid- The formatting is getting better now, here are some new observations (and one question) 1)images in the lrf generated from the test book i posted aren't centered 2) the pagebreak (<hr>) before the first chapter in the sample posted doesn't get picked up correctly by html2lrf 3) images in some other books aren't getting picked up right. In another document this image: <DIV align=center><IMG hspace=0 src="chapter1.jpg" align=baseline border=0></DIV> isn't picked up 4) Would you prefer I enter these types of comments as tickets on your trac page? and do you prefer 1 ticket per comment or one "round-up" ticket? Thanks again as always. kovidgoyal 10-08-2007, 09:58 PM The image centering is simply because in html2lrf images are usually inline, so they aren't centred. I could add some code to the preprocessor to generate centered images. Yeah, tickets help me keep track of things. maggotb0y 10-08-2007, 10:22 PM Jeez, you're fast. Okay, from now on I will enter comments as tickets. One thing I forgot to ask. Is it possible to have the first paragraph of every chapter use drop caps? I have looked around a bit, but I don't see that documented (seems like it is basically supposed to be in the atributes of the par in the HTML?) It would be a nice feature when converting from another format to be able to specify that as an option. Also, in what file or group of files does the BD pre-processor lie? I've not used Python before, but I am a developer and I'm fairly comfortable with regex's so I may be able to offer more useful suggestions if I know where to look. Also, how does drop caps handle an open quote or an open smart quote before the first letter? Finally (for now), is there a guide to getting a python environment up and running under Windows that would be approprate for compiling libprs500? If not, I'll do my best to get things running and write a guide for others to follow. |