View Full Version : LRF output


Pages : 1 [2] 3 4

kovidgoyal
07-10-2007, 02:20 PM
Released 0.3.67 with support for definition lists and a fix for the handling of zip files.

bkilian
07-10-2007, 04:53 PM
Released 0.3.67 with support for definition lists and a fix for the handling of zip files.Dude, you are one of the most responsive devs I've ever seen, and I've seen a lot of devs :)

As to the Python regular expression, One I've found that only matches paragraphs containing only an <a id...></a> seems to work on the baen books I've tried it on.
<p>\s*<a id.*?>\s*</a>\s*</p>

kovidgoyal
07-10-2007, 05:17 PM
That's coz I use libprs500 a lot myself and I want it to be as bug free as possible :-) I essentially use all you guys as free bug hunters as bug-hunting is something I'm extremely lazy about. And those two fixes were about 10 lines of code.

But aren't those id elements referred to by some links in the rest of the file?

bkilian
07-10-2007, 06:07 PM
That's coz I use libprs500 a lot myself and I want it to be as bug free as possible :-) I essentially use all you guys as free bug hunters as bug-hunting is something I'm extremely lazy about. And those two fixes were about 10 lines of code.

But aren't those id elements referred to by some links in the rest of the file?Not when it's the only element inside a <p>. The regex grabs every <p> that _only_ has a single <a> in it and only if the <a> starts with "id" and has no text (only whitespace between <a> and </a>). I can't think of, nor did I see, any useful use of that particular combination. We can always make it even more picky by requiring the <a> to have no "href", but I don't think it's necessary.

Edit: Oh, you're asking if the paragraph indicators (which is what these are) are used by anything else? No. They're used in the "web" reading version to update the silly "index" box. (Check http://www.webscription.net/10.1125/Baen/0671318470/0671318470.htm and move your mouse down the page. You can type a number into the box and it'll jump to that paragraph.) The html in the LIT doesn't have the javascript to enable this.

Note that the pure html versions don't have <p> surrounding the <a> elements, so they don't render, it's really only an issue with the files they include in their LIT versions (I suspect the OEB DTD requires the surrounding <p>).

kovidgoyal
07-10-2007, 06:14 PM
I meant aren't there <a href> elements that refer to that id? So that removing the id would make those links not work. THough I suppose I could just remove the <p> and keep the <a>

bkilian
07-10-2007, 06:59 PM
No, I don't believe there are any links that refer to the specific paragraphs. Just removing the <p> would be a perfectly reasonable compromise too :)

kovidgoyal
07-10-2007, 07:17 PM
OK added it to SVN.

jgs
07-12-2007, 09:04 PM
After a very positive experience with libprs500 and Kovid's patient support, I am seriously considering moving totally over to version .4 in lieu of Connect.

Questions:

1. Any guess when .4 will be available?
2. Will "Genre" be one of your metadata elements?

Great job

Jim

JSWolf
07-12-2007, 09:08 PM
if you don't use Connect, you cannot purchase books at the Connect store. If that is not an issue, go for it. Of course, if you have multiple computers, you can install Libprs500 on one and Connect on the other as long as both computers run 32-bit Windows.

kovidgoyal
07-12-2007, 09:18 PM
1 month < 0.4.0 < 3 months

You should use the Tags facility in libprs500 to specify genre metadata. The search bar searches over tags as well.

jgs
07-12-2007, 11:48 PM
I've got four systems going on five; thanks for the advice.

Jim

bnocturnal
07-13-2007, 01:19 PM
Hello,

First off, Thanks for the great program!

I seem to have run into a bug... not sure if it is by design, though..

I am converting an ebook (Thinking in Java, from http://mindvew.net), and have run into an issue with html blockquotes.

This book uses the block quotes for code samples, and while html2lrf does "block" the text, it doesn't seem to convert any of the html....

what should be this:

monitor.expect(new String[] {
"1: n1.i: 9, n2.i: 47",
"2: n1.i: 47, n2.i: 47",
"3: n1.i: 27, n2.i: 27"
});

comes out like this:

monitor.expect(<font color="#0000ff">new</font> String[] {
<font color="#004488">"1: n1.i: 9, n2.i: 47"</font>,
<font color="#004488">"2: n1.i: 47, n2.i: 47"</font>,
<font color="#004488">"3: n1.i: 27, n2.i: 27"</font>
});


I suppose i could come up with some regex to remove the "font" tags...

Dave

kovidgoyal
07-13-2007, 02:22 PM
Post a simple html file that reproduces the problem. Just copy paste one of the blockquote sections along with any css sections/files.

bnocturnal
07-13-2007, 02:34 PM
Post a simple html file that reproduces the problem. Just copy paste one of the blockquote sections along with any css sections/files.


Here you go...

I included a section with several blockquotes, and the css, and the .lrf output.

Thanks,

Dave

kovidgoyal
07-13-2007, 03:11 PM
Hmm see what you mean...two line fix will be in next release.

bnocturnal
07-13-2007, 03:57 PM
Hmm see what you mean...two line fix will be in next release.


Great! Thanks!

Dave

yonkz
07-14-2007, 03:14 PM
Hi,
I'm using the command line version of html2lrf, and I'm trying to use the --chapter-regex to build the toc, but it doesn't behave as I expect it to.

Other than that I'm loving what I've seen so far.

Perhaps I am doing something wrong.

Here is the command I am using:
C:\html2lrf>html2lrf -a "Author" -t "Title" --chapter-regex="<I>(Chapter [1-9]+[,].*)</I>" --font-delta=-2 "c:\ebooks\MyBook.htm"

The regex I'm using here works in other situations, but I'm not familiar with the implementation here, so perhaps the syntax should be different.

The Chapter headings in the html doc look like this:
<I>Chapter 1, Title of Chapter 1</I>


Any help would be appreciated.

kovidgoyal
07-14-2007, 03:47 PM
Released v0.3.72:
1) Support for nested lists
2) Bug fixes

kovidgoyal
07-14-2007, 03:48 PM
You shouldn't have the <I> tags in the regex, it matches on the contents of the tag. Infact the default regex should work for you.

yonkz
07-14-2007, 04:11 PM
:)Wow, fast response - kovidgoyal!

OK, I reread the quick little blurb about the chapter regex from the help, and came to a realization of what the problem was.

The chapter regex only looks for matches inside a <H1> or other H tag, so since these were not inside H tags, they weren't found.

What I had to do was use another program (just whipped up a quick c# app) to wrap the matches in H1 tags, and save a copy of the file with the new tags, than re-ran the cmd line utility, and all was well.



Do you think it would be possible to optionally remove the requirement for the chapter regex match to look for the H tags, and just use the supplied regular expression to find the chapter headings?

Thanks

yonkz
07-14-2007, 04:15 PM
Too quick, there.

It did detect the chapters according to the cmd line output, but I guess it doesn't actually build a TOC with that info. When I open the output lrf - the TOC doesn't have the chapters in it.

Is there a way to build the TOC and have it show up in the lrf?

kovidgoyal
07-14-2007, 04:52 PM
All links in the top level html file are put into the TOC.

Chuck Eglinton
07-15-2007, 02:33 PM
HTML2LRF is an excellent program but sometims chokes on

Unicode Character 'RIGHT SINGLE QUOTATION MARK' (U+2019) (also known as the apostrophe)

I receive the error on some HTML files created by MSWord 2000.

Also....

First chapter in Table of Contents (TOC) doesn't appear in TOC list on Reader or in Connect software.

When saving a document as HTML with Microsoft Word 2000, HTML2LRF correctly *reports* all the chapters being found in the table of contents (that is, "Detected Chapter..." shows all chapters correctly when compiling... all the text defined as "Header1")

However, even though HTML2LRF displays all the chapters being found, the very first chapter is not displayed in the Table of Contents (TOC) when opening the .LRF file in the SONY CONNECT Desktop software or the reader.

I'm able to work around it - HTML2LRF is still a great program

FangornUK
07-15-2007, 02:39 PM
MS Word 2000 creates dreadful HTML. Clean it up with "tidy --wrap 0 --word-2000 yes" from the program at http://tidy.sourceforge.net/ and see what happens then.

kovidgoyal
07-15-2007, 03:40 PM
HTML2LRF is an excellent program but sometims chokes on

Unicode Character 'RIGHT SINGLE QUOTATION MARK' (U+2019) (also known as the apostrophe)

The only thing that html2lrf does when it detects a chapter heading is insert a page break before it. It does not insert detected chapters into a TOC. This is because it uses links i.e. <a href> tags to build the TOC. Every (almost) <a href> element gets put into the TOC.

Send me an example html file that causes the unicode problem.

JSWolf
07-15-2007, 09:35 PM
I tried embedding the font Arial Narrow and I got Bold and Italics instead of normal where I should have. When it called for italics, I did get the correct italics version.

kovidgoyal
07-15-2007, 09:44 PM
Yeah with most fonts for some reason I cant figure out the LRF renderer chooses the bold as the normal even though the fonts are correctly embedded.

JSWolf
07-15-2007, 10:07 PM
I removed the bold italic font and ended up with the italic version.

kovidgoyal
07-15-2007, 10:13 PM
You mean you removed it for the windows\fonts directory?

JSWolf
07-16-2007, 06:26 AM
You mean you removed it for the windows\fonts directory?
Yeah, I copied the font out of the Windows font directory so it could not be used. After I gave up on Arial Narrow, I put the font back.

edbro
07-16-2007, 12:02 PM
Is there a way for the default to save to the same directory as the input file? Vista does not allow for the files to be saved to "program files". It will save any such files to a virtual store which is buried in multiple subdirectories and hard to find. I would love for it to save automatically to the same directory as the source file. Yes, I know I can input an output directory from the command line but, I'd love for this to be a default.

JSWolf
07-16-2007, 12:07 PM
Is there a way for the default to save to the same directory as the input file? Vista does not allow for the files to be saved to "program files". It will save any such files to a virtual store which is buried in multiple subdirectories and hard to find. I would love for it to save automatically to the same directory as the source file. Yes, I know I can input an output directory from the command line but, I'd love for this to be a default.
You do not need the source in the same directory as the program. So you can put the source in some other driectoory and the default of saving in the same directory will work fine.

edbro
07-16-2007, 12:12 PM
You do not need the source in the same directory as the program. So you can put the source in some other driectoory and the default of saving in the same directory will work fine.

Not from my experience. I have Libprs500 in "C:\program files". I have the source in "D:\work". The command html2lrf D:\work\demo.html saves the output to "C:\users\ed\appdata\local\virtualstore\program files\libprs500\demo.lrf"

JSWolf
07-16-2007, 01:31 PM
1. Start | Run | cmd
2. d:
3. cd \work\demo
4. html2lrf demo.html

That is all you need to do and demo.lrf will be in the same directory as demo.html.

JSWolf
07-16-2007, 01:32 PM
Would it be possible to have an option to control the paragraph indent? Some HTML are fine as is and some are too small. I would like if possible a default of 4 spaces for an indent if that's OK.

edbro
07-16-2007, 01:36 PM
1. Start | Run | cmd
2. d:
3. cd \work\demo
4. html2lrf demo.html

That is all you need to do and demo.lrf will be in the same directory as demo.html.

Thank you, I was assuming that I had to be in the libprs directory to access the executable.

bkilian
07-16-2007, 06:48 PM
Oh, I noticed a problem with the code you added to remove the extra <p> in Baen books, I tried to work out why it was happening, and in looking at your source I think I worked it out.
In your BAEN processing stuff, it looks like you have code that strips out the <a id="pXXX"> lines right before you try to strip out the whole <p> including them. I assume the second one never works since the first one has already modified the lines.

kovidgoyal
07-16-2007, 07:18 PM
Oh, I noticed a problem with the code you added to remove the extra <p> in Baen books, I tried to work out why it was happening, and in looking at your source I think I worked it out.
In your BAEN processing stuff, it looks like you have code that strips out the <a id="pXXX"> lines right before you try to strip out the whole <p> including them. I assume the second one never works since the first one has already modified the lines.

Good catch, I've re-ordered them.

kovidgoyal
07-16-2007, 11:24 PM
Would it be possible to have an option to control the paragraph indent? Some HTML are fine as is and some are too small. I would like if possible a default of 4 spaces for an indent if that's OK.

The CSS option text-indent controls this.

bkilian
07-17-2007, 02:58 PM
Is there some method for forcing page breaks other than using a regex? My most recent baen purchase is doing weird things at the page breaks. (Especially since the CSS specifically tells it to page-break-before all H1 tags)
One such case is this:
<h1 align="center">
<a name="Chap_2">
</a>
<b>MAY, YEAR OF GOD 890</b>
</h1>
<h2 align="center">
<b>I</b>
</h2>
In that case, the page break happens between the H1 and the H2, which is pretty weird. Is there some tag we can use to force a page break? Or is there some way we can tell it that <h1> always signifies a new chapter, irrespective of it's contents?

kovidgoyal
07-17-2007, 04:14 PM
Look at the CHAPTER OPTIONS help text in the html2lrf help.

Bokeh
07-20-2007, 12:18 PM
Hello, I am a bit of a newb at this. I was using the "old" html2LRF program for a long time this morning, converting html files from gaslight and it was working great- until it suddenly decided to stop working with no error messages or explanations.

So I found this thread and downloaded this newer program. I tested it on a small html file from gaslight, and it seemed to work, until i tried to bring the new .lrf file into the "CONNECT Reader" program's library. Any time I tried to import the file, the connect reader program would exit without any warning. I tried some more html files with the same result.

Any ideas? Am I doing something wrong?

edit: just used it on a gutenberg zipped html file and it worked perfectly... so maybe there is a problem with the gaslight texts?

Here is one of the gaslight files i tried to convert and couldn't read properly:

link (http://gaslight.mtroyal.ab.ca/gaslight/strngild.htm)

kovidgoyal
07-20-2007, 01:03 PM
That's because the entire text is inside a single table cell. The old HTML2LRF simply ignored table markup the new one processes it, incorrectly in this case. I'll upload a fixed version soon.

Bokeh
07-20-2007, 01:36 PM
great, thanks! And thanks in general for making such an awesome program!

kovidgoyal
07-20-2007, 03:26 PM
Released v0.3.78 with --ignore-tables option.

@Bokeh
Use the commandline

html2lrf --ignore-tables gaslight.htm

bkilian
07-22-2007, 03:42 AM
hehe, back again...
I'm trying to indent an entire paragraph (not just the first line), something like what <blockquote> does. I've tried margin-left and margin-right, and padding-left and padding-right, both do the right thing in my browser, but html2lrf ignores both of them. Is there something I should be using for this?

kovidgoyal
07-22-2007, 04:40 AM
html2lrf doesn't support block level indentation other than through <blockquote> as I didn't really see any need for it.

classicspam
08-01-2007, 09:59 PM
I know it may seem to be a waste of time, however would it be feasible to add an option in the future to put a new line (blank) in between paragraphs?

kovidgoyal
08-01-2007, 10:05 PM
open a ticket and I'll get around to it.

JSWolf
08-01-2007, 11:19 PM
open a ticket and I'll get around to it.
Please make sure it's gotten around to after version 10.9.

kovidgoyal
08-01-2007, 11:32 PM
You have my word.

RWood
08-01-2007, 11:35 PM
Please make sure it's gotten around to after version 10.9.
Jon, cut him some slack, maybe version 22.07.11. Blank lines after paragraphs are not that easy to generate. :D

JSWolf
08-02-2007, 01:20 AM
Jon, cut him some slack, maybe version 22.07.11. Blank lines after paragraphs are not that easy to generate. :D
Oh wait, that's a MobiPocket format feature. LRF doesn't do extra blank lines after paragraphs unless it's part of the book.

astra
08-02-2007, 04:29 PM
Do I understand it right, that I can use lib500prs windows installer which has GUI interface instead of Connect software?

I am going on holiday in a few days and I am taking my laptop with me. It does NOT have Connect software atm but I might want to charge my reader via USB, so I need a software that would recognise the reader. If I install lib500prs and it will allow me to charge the reader I would like to install it. Good opportunity to explore the program without messing Connect software.

kovidgoyal
08-02-2007, 05:49 PM
Yeah it will allow you to charge your reader.

astra
08-02-2007, 05:55 PM
Thanks.

I will install it tomorrow evening :)

kovidgoyal
08-02-2007, 11:13 PM
v 0.3.83 is on its way to the servers:
1) --blank-after-para
2) bug fixes
3) internal re-organization that may have introduced some regressions
4) increased default para indent

flyneo
08-05-2007, 08:18 PM
Hi,

I used this tool to convert a big HTML file (30M with images). However, I got the following error after a while:

Processing 0-201-63354-X_Joined.htm
Parsing HTML... done
Converting to BBeB...Unhandled/malformed CSS key: text-align left

Traceback (most recent call last):
File "convert_from.py", line 1453, in <module>
File "convert_from.py", line 1387, in main
File "convert_from.py", line 1288, in process_file
File "convert_from.py", line 377, in __init__
File "convert_from.py", line 461, in parse_file
File "convert_from.py", line 672, in process_children
File "convert_from.py", line 1196, in parse_tag
File "convert_from.py", line 672, in process_children
File "convert_from.py", line 1196, in parse_tag
File "convert_from.py", line 672, in process_children
File "convert_from.py", line 1194, in parse_tag
File "convert_from.py", line 1205, in process_table
File "libprs500\ebooks\lrf\html\table.pyo", line 359, in blocks
IndexError: list assignment index out of range

Could you help me with that? Thanks!

kovidgoyal
08-05-2007, 09:25 PM
Attach the html file or send it to me.

flyneo
08-06-2007, 12:34 AM
Attach the html file or send it to me.

Thanks for the quick reply. Please see the attachment for the file that caused the problem. This is the main file I passed to html2lrf. There are some additional files which I can send to you if needed.

Basically what I was doing is convert a tech book which is in chm format into a lrt file. There are more than 1000 pages and hundreds of images and tables with the chm book. I firstly decompiled it using 'hh' to a bunch of html files and images. Then I joined these html files using BookDesigner into a few files. And passed the joined html file to html2lrf converter. Before doing this, I also tried to pass the original html files directly to html2lrf without joining. But the program ran for about an hour then exit with errors.

Again thank you for the help!

kovidgoyal
08-06-2007, 12:48 AM
This file seems to use tables for formatting included nested tables.html2lrf supports only simple tables. Try using the --ignore-tables option.

flyneo
08-06-2007, 01:22 AM
It works great now. Thanks a lot!

Valdhor
08-08-2007, 12:32 PM
If you have CSS declared for <pre> tags, you will get errors.

HTML File:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">
<HTML>
<HEAD>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Bug Testing</title>
<link rel="stylesheet" href="bugtest.css" type="text/css" />
</head>
<body>
<pre>Some Stuff</pre>
</body>
</html>


CSS File (bugtest.css):

pre {
border: solid 1px #444;
background-color: #e0e0e0;
padding: 0.1in;
margin: 0.2in;
clear: right;
}


This will produce the following errors:

Unhandled/malformed CSS key: border solid 1px #444
Unhandled/malformed CSS key: clear right
Unhandled/malformed CSS key: margin 0.2in
Unhandled/malformed CSS key: background-color #e0e0e0
Unhandled/malformed CSS key: text-align left

Valdhor
08-08-2007, 12:43 PM
I keep getting the error WARNING: An error occurred while processing a table: list assignment index out of range
Ignoring table markup


Unfortunately, this is a 3.2MB file that has been generated from multiple (Read hundreds) HTML files into one big file. What would be useful is something to the effect...

WARNING: An error occurred while processing a table: list assignment index out of range
at line xxx near sometextthatwasreadnearwheretheerroroccurred
Ignoring table markup


so that I can figure out where the problem is and fix it.

kovidgoyal
08-08-2007, 12:55 PM
The CSS messages are not error they just indicate that those CSS properties have been ignored.

The tables error message is most likely generated because of a complex and/or nested table. I'll output the first hundred characters of the table tag in the next version, but line numbers are not going to be possible because of the way the html parser is designed.

Valdhor
08-08-2007, 01:42 PM
Thank you.

That will make it much easier to find where the error is.

I have been looking for days trying to find a complex or nested table but with over 4700 lines and lots of small tables it's gonna take a while :tired:

Valdhor
08-08-2007, 01:43 PM
sorry, that should have been 47000 lines.

kovidgoyal
08-08-2007, 01:53 PM
why dont you run html2lrf on the individual files? that way you can narrow down the search a lot.

Valdhor
08-08-2007, 03:16 PM
Well, that would have worked :o but I figured out where the problem was - rowspan="2".

kovidgoyal
08-08-2007, 03:49 PM
Hmm was that in a relatively simple table? Because if it was it's a bug in html2lrf and i'd like you to send me the html file so I can fix it.

Valdhor
08-09-2007, 10:05 AM
Basically, there are a whole bunch of these type constructs which are small tips...


<div class="note"><table border="0" summary="Note">
<tr>
<td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.gif"></td>
<th align="left">Note</th>
</tr>
<tr><td align="left" valign="top"><p>
When specifying multiple parents for a Role, keep in mind that the last parent
listed is the first one searched for rules applicable to an authorization query.
</p></td></tr>
</table></div>


If you remove the rowspan="2" then html2lrf works fine.

I will email you the entire file separately.

kovidgoyal
08-09-2007, 11:10 AM
thanks it wasn't rowspan=2 it was this table in particular that's causing the problem:

<div class="note"><table border="0" summary="Note">
<tr>
<td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="images/note.gif"></td>
<th align="left">Note</th>
</tr>
<tr><td align="left" valign="top"><p>
When specifying multiple parents for a Role, keep in mind that the last parent
listed is the first one searched for rules applicable to an authorization query.
</p></td></tr>
</table></div>

DNel
08-09-2007, 09:24 PM
I just downloaded the libprs500.exe windows installer. In the installation I chose not to install the drivers as I don't have my eReader yet and don't want to lose the ability to use the Connect. After installing, I clicked on the libprs500 icon to launch the program and got the following error message in the associated log

Traceback (most recent call last):
File "main.py", line 25, in <module>
File "libprs500\devices\prs500\driver.pyo", line 56, in <module>
File "libprs500\devices\libusb.pyo", line 32, in <module>
File "libprs500\__init__.pyo", line 50, in load_library
File "ctypes\__init__.pyo", line 423, in LoadLibrary
File "ctypes\__init__.pyo", line 340, in __init__
WindowsError: [Error 126] The specified module could not be found

Any ideas? I uninstalled and then did a reinstall and got the same error. Is it because I unchecked the driver?

Thanks
DNel

kovidgoyal
08-09-2007, 09:46 PM
yeah its a bug. I'll fix it in the next release. You can install the drivers and then uninstall libprs500 and the connect software wont be affected.

DNel
08-10-2007, 02:10 PM
Thanks, I was able to run libprs500 after I reinstalled with the drivers. However, being completely illiterate when it comes to using it I see that there are icons for add books to library, delete books, edit meta information. After adding the HTML file to library, I don't see how to convert it to LRF. THere is also no place to enter command lines. One of the previous posts mentioned the command line program was in the libprs500 folder. I only see libprs500 (the gui interface) uninstall, and a link to your web page. How do I convert it now to LRF format?

Thanks
DNel

Sorry for being so computer illiterate:tired:

kovidgoyal
08-10-2007, 02:17 PM
Ah yes the conversion tools have not yet been integrated with the GUI. At the moment what you need to do is the following:
1) Open a command prompt (Start->run->cmd.exe)
2) change to the folder with the html file (cd "c:\myfolder")
3) html2lrf "myfile.html"

As I get the time I will integrate the conversion tools into the GUI, but that's going to take a while.

DNel
08-10-2007, 03:46 PM
Thanks,
I tried it on a sample web page and it work great.
DNel

JSWolf
08-11-2007, 06:53 PM
I can't find any way to generate left justified text with html2lrf. Everything ends up fully justified even if I specify align="left" or style="text-align:left".

Please, please, please add a way to generate left justified text!

Thanks

P.S. copied since it does belong here.

JSWolf
08-11-2007, 06:55 PM
I can't find any way to generate left justified text with html2lrf. Everything ends up fully justified even if I specify align="left" or style="text-align:left".

Please, please, please add a way to generate left justified text!

Thanks

You cannot have left justified text unless you put in the hard returns. It's just the way LRF for the Sony Reader works. So get to hitting return.

Excalibur
08-11-2007, 08:08 PM
Hey, just wanted to say thanks for a decent set of conversion utilities.

I did, however, want to point out that you should be reading the .opf file instead of the .html file that is generated with ConvertLit. I noticed that you have probably hardcoded the .htm extension into the converter, bad practice. The .opf has all the file names necessary for the content, covers, etc. and you should instead focus on that while still not hard coding file extensions.

The reason I suggest this? Some of the LIT files I have have text instead of HTML as the body of the book and your tool craps out when it doesn't find an HTML file. The OPF, which is always parsed and output by ConvertLit consistently uses the id of "content" for the main body. You could grab that file name instead of relying on an HTML file being generated. It'd make the tool a bit more stable. :)

Just my opinion.

Do you know where the format specs for LIT and LRF can be found? I would like to take a look at how the files are set up...

Also, a feature request:
Could you set it up so that we could make CSS style changes from the commandline as well? For instance, setting text-indent for <p> tags. This would make the books look oh so much nicer with some indentation going on :)

kovidgoyal
08-11-2007, 08:19 PM
Yeah in an ideal world it would. Since I've never come across a lit file with txt in it, it isn't a priority. If you need that feature feel free to submit a patch on the bug report https://libprs500.kovidgoyal.net/ticket/126

Incidentally, lit2lrf does read the opf file, for the metadata.

Excalibur
08-11-2007, 08:22 PM
Cool, cool. I believe you should be using the OPF file for just about everything since it is, in essence, the .CUE file of the .OPF's .BIN.

I'll add that bug later tonight from work :) I see you monitor your threads closely, good on ya! It's great to see a dev so on-the-ball :) Keep up the great work, btw!

kovidgoyal
08-11-2007, 10:40 PM
You cannot have left justified text unless you put in the hard returns. It's just the way LRF for the Sony Reader works. So get to hitting return.

Somebody needs to bug SONY to fix the LRF renderer on the reader to not right justify text that is left aligned. I suspect they did that because the engineers developing the renderer were japanese and didn't realize that right justification is bad for english text.

Excalibur
08-12-2007, 01:30 AM
Hey, put in the .txt glitch (I'd say it isn't a full-blown bug since it can be edited by hand and html2rlf run on it, but that defeats the purpose of lit2lrf, doesn't it? :) and also added the CSS from the commandline as an enhancement. I added possible solutions as well, to help you out.

Maybe I could learn some python or convert it to java or something... *ponder* who knows, either way, hope that helped you out :)

And I'd actually love to see the proprietary formats go away and just use HTML. Perhaps a stripped down Gecko HTML engine in place of the LRF engine that's made to just display HTML & CSS without all the added junk? Maybe keep the plugin enhancement system so that you could add display of LIT or other proprietary formats (having been converted to HTML with the plugin...). Now THAT would be something to see :)

kovidgoyal
08-12-2007, 02:52 AM
google the epub format its an open html based format that will be eventually supported on the reader .

beartard
08-12-2007, 12:01 PM
google the epub format its an open html based format that will be eventually supported on the reader .

Wait. Did you say the reader is getting additional format support? :pray:

igorsk
08-12-2007, 12:48 PM
http://www.adobe.com/aboutadobe/pressroom/pressreleases/200706/061907DigitalEditions.html
"In addition, with versions for mobile platforms and reading devices also planned, Sony has committed to embed Adobe Digital Editions technology into its portable reader product line."
However, I wouldn't expect it before we see Linux version of Digital Editions, and we don't know if the support will make it into PRS-500 and not just the next generation.

bluelight
08-13-2007, 11:03 AM
i seriously, seriously have no idea how to use this converting gui.

Is there a guide of dummies anywhere? cause everybody seems to know what they are doing :(

kovidgoyal
08-13-2007, 11:10 AM
its not a GUI. Its a commandline app (at least for the moment).
Start -> Run -> cmd.exe

cd c:\path\to\your\folder
html2lrf yourbook.html

bluelight
08-13-2007, 11:22 AM
its not a GUI. Its a commandline app (at least for the moment).
Start -> Run -> cmd.exe

cd c:\path\to\your\folder
html2lrf yourbook.html


ok.. then what? i did that and it didn't convert yet :(

kovidgoyal
08-13-2007, 11:45 AM
That's it the lrf file will be in that directory.

bluelight
08-13-2007, 12:02 PM
didn't show up. am i supposed to tweak it after all that scrolled options?

HarryT
08-13-2007, 12:05 PM
Are you running it from the folder containing your HTML file?

bluelight
08-13-2007, 12:07 PM
Are you running it from the folder containing your HTML file?

yup :blink:

kovidgoyal
08-13-2007, 12:13 PM
if the options are showing that means you did not specify the file name correctly. If the filename has spaces in it you need to enclose it with quotes

bluelight
08-13-2007, 01:02 PM
omg it worked. thank you so much XD

_Sin
08-15-2007, 12:42 PM
Hi.

This tool is extremely useful - if it didn't exist, I'd have to write it myself!

However I've found a slight problem - it seems to not want to load images if the file extension isn't set correctly on them. I'm converting Mobipocket files (not being in the US I have to either blag credit for the Connect store or buy my books elsewhere...) and the images don't come with proper names. Right now I just spit out extentionless files matching the IDs in the file, but although they view fine in a IE, converting them to LRF loses the images. If I manually fix up the filenames and links to include the image extension, it works.

I could write some smarter post-processing stuff at this end, but is there a simple fix for html2lrf that would get around this?

kovidgoyal
08-15-2007, 12:52 PM
Are the images embedded using <img> tags or <a> tags?

_Sin
08-15-2007, 01:01 PM
Are the images embedded using <img> tags or <a> tags?

So far I've only seen <img> tags, they don't seem to be used for linking very often.

kovidgoyal
08-15-2007, 01:07 PM
In that case I don't see why they should be omitted. Send me some example files with the image files as well.

_Sin
08-15-2007, 01:11 PM
In that case I don't see why they should be omitted. Send me some example files with the image files as well.

Ok, I'll need to try to construct an example as the ones I'm actually using are obviously copyright...

JSWolf
08-15-2007, 01:36 PM
I'm converting Mobipocket files
Are you converting MobiPocket files with DRM? If so, I'd love to know how.

_Sin
08-15-2007, 01:52 PM
Are you converting MobiPocket files with DRM? If so, I'd love to know how.

Yup. The DRM isn't too tricky to decode. However I'm probably on slightly dodgy territory if I say too much about that - I'm only doing it for my own use on files I own, and I have no plans to distribute the tools I've written to strip the DRM out.

_Sin
08-15-2007, 02:39 PM
In that case I don't see why they should be omitted. Send me some example files with the image files as well.

Ok, here's a simple example of the problem - I've included the generated LRF file as well as the html source and image files...

kovidgoyal
08-15-2007, 02:55 PM
Fixed in svn.

angelyne
08-15-2007, 10:22 PM
I really enjoy this tool, I've been using it more and more. I've even started to learn a little html so I could edit the books more to my liking before converting them into BBeB.

There is one thing I can't seem to do however, and that's due to my poor skill with html. I'd like to be able to center text vertically on a page. Can this be done easily? I searched but only ended up more confused. I figured that by asking here, I'd get an answer that's appropriate to the BBeB format. Sorry to ask what is probably a really newbie-ish question.

kovidgoyal
08-15-2007, 10:34 PM
Glad to hear it. No you cannot center text vertically on a page. As far as I know the LRF format doesn't have any support for doing this in a reflowable fashion. A workaround you could use is inserting a few blank lines above the text and force a page break before and after.

Pode
08-16-2007, 06:12 AM
I'm surprised by the two last comments.
I read a lot of webpages on the Sony Reader with the help of your tool, html2lrf.
Some of the pages I save on my hardrive for processing by html2lrf aren't properly edited (since usually I strip everything that's not useful from the page with a little bookmarklet, CSS and page style are messed up), and I end up several time with paragraph that are center vertically on the page, (justified on the left and right, but the first and last line are centered vertically).

JSWolf
08-16-2007, 09:37 AM
How can one make hanging indents with html2lrf?

kovidgoyal
08-16-2007, 10:47 AM
I'm surprised by the two last comments.
I read a lot of webpages on the Sony Reader with the help of your tool, html2lrf.
Some of the pages I save on my hardrive for processing by html2lrf aren't properly edited (since usually I strip everything that's not useful from the page with a little bookmarklet, CSS and page style are messed up), and I end up several time with paragraph that are center vertically on the page, (justified on the left and right, but the first and last line are centered vertically).

Interesting can you send me an example?

kovidgoyal
08-16-2007, 10:47 AM
How can one make hanging indents with html2lrf?

What's a hanging indent?

EDIT: If it means what I think it means, this will do the trick, provided each logical line is at most one physical line long:

<p style="text-indent:0pt">First line</p>
<p style="text-indent:30pt">Second line
<p style="text-indent:30pt">Third line</p>

JSWolf
08-16-2007, 09:49 PM
A hanging indent is a paragraph where the first line sticks out father then the rest of the lines in the paragraph. Your sample won't work properly as it disables proper word wrap. It makes seperate lines instead of a proper paragraph.

Take the above paragraph and move the rest of the text so it starts under the first g in hanging and you have it. Also it should be a proper paragraph and not seperate lines. So when you resize the font, it stays a hanging indent.

kovidgoyal
08-16-2007, 09:54 PM
sticks out to the left or the right?

JSWolf
08-16-2007, 09:58 PM
sticks out to the left or the right?
To the left.

Here is a link that defines a hanging indent...

http://www.webopedia.com/TERM/H/hanging_indent.html

kovidgoyal
08-16-2007, 10:13 PM
yeah to have this ability html2lrf would need to support the margin css property, which I'm insufficiently motivated to support.

JSWolf
08-16-2007, 10:16 PM
So basically then there will not be any proper way to format some poetry or scripts. That's too bad.

kovidgoyal
08-16-2007, 10:18 PM
I dont see why poetry requires hanging indents. Indeed I've never read any poetry that has hanging indents.

JSWolf
08-16-2007, 10:22 PM
I dont see why poetry requires hanging indents. Indeed I've never read any poetry that has hanging indents.
LaughingVulcan said (in another thread) that he's trying to format some poetry with hanging indents.

kovidgoyal
08-16-2007, 10:23 PM
Well he can just leave out the hanging indents, the poetry wont suffer too much.

LaughingVulcan
08-16-2007, 10:24 PM
Mmmm.... that's what I was afraid of in my thread. While not an absolute for poetry formatting, it is the way that I *strongly* prefer to read it.

There is a (poor) example of hanging indent in html here (http://goer.org/HTML/intermediate/align_and_indent/).

For those not reading the link, I'll share the following example from my other thread:

("fakey" margins of reader below)

*---*---*---*---*---*---*
The quick brown fox jumps over the lazy dog.
And the sneeze in the breeze upsets the bees.

I would want to have become:

*---*---*---*---*---*---*
The quick brown fox jumps
over the lazy dog.
And the sneeze in the
breeze upsets the
bees.

Instead of

*---*---*---*---*---*---*
The quick brown fox jumps
over the lazy dog.
And the sneeze in the
breeze upsets the bees.


As I mentioned there, this can be achieved with the (laborious) process of inserting line breaks manually, but the breaks won't work right when the size is upped to Medium and then Large. (In fact, it gets virtually unreadable on upping the size.)


Thanks JS for passing it over to this thread... ;)

And thank you, Kovid, for HTML2LRF and the rest of the apps. It still rocks! :D

kovidgoyal
08-16-2007, 10:32 PM
Try inserting blank lines between sentences to improve readability.

JSWolf
08-16-2007, 10:34 PM
Try inserting blank lines between sentences to improve readability.
I think that'll just make it a mess to read. I would not want to read it like that.

kovidgoyal
08-16-2007, 11:13 PM
Version 0.3.96 is on its way to the servers:
1) Completely refactored to optimize memory usage. Hopefully this hasn't introduced too many new bugs.
2) Added support for <sup> and <sub>
3) Fixed handling of text-indent (should make the indent correct in lit files)
4) Various minor bug fixes.

irishjew
08-16-2007, 11:46 PM
I first just want to say that Kovid is God. Secondly, I apologize if this has already been touched on, but I have a strange problem, and I'm a bit of a noob. I found one html file that had internal links for each chapter, and I tried to copy that code into a book of short stories I'm converting to lrf in order to make a table of contents. The links work just fine in html, but when I convert the file to lrf, they are nowhere to be found. Does anyone have an idea of what might cause that?

kovidgoyal
08-16-2007, 11:55 PM
Run it with the --verbose switch

irishjew
08-17-2007, 12:19 AM
Here's what I get. Looks pretty straightforward.

C:\Program Files\libprs500>html2lrf --verbose "C:\Program Files\libprs500\a.html"
[INFO] convert_from.py:391: Processing a.html
Parsing HTML...
[INFO] convert_from.py:405: Converting to BBeB...
[INFO] convert_from.py:1374: Output written to C:\Program Files\libprs500\a.lrf

kovidgoyal
08-17-2007, 12:25 AM
send me the file.

irishjew
08-17-2007, 12:48 AM
Well, in the interest of not violating copyright laws, I'm attaching a test html file that I have the same problem with. If you open the html file, you'll see the link there working just fine, but the link doesn't appear in the lrf file.

kovidgoyal
08-17-2007, 12:57 AM
its in a <pre> tag.

irishjew
08-17-2007, 01:09 AM
What do you mean? Sorry, I'm a noob.

I appreciate all this, by the way.

kovidgoyal
08-17-2007, 01:21 AM
the <a href> tag is inside a <pre> tag. Delete the <pre> tag and you'll be fine.

Excalibur
08-17-2007, 02:11 AM
That's an incorrect way of thinking about it, however. The <pre> tag is meant to hold other things within it like a DIV tag. Now, if you've supported the CSS property "white-space" that allows for text to be preformatted, then that's cool: white-space: pre;

<span><div><pre><p> and a few other tags are meant to hold links and other things within them.

If you support padding and text-indent, you might be able to try this:

<html>
<head>
<style>
p
{
text-indent:-30px;
padding-left:30px;
}
</style>
</head>
<body>
<p>Curabitur gravida imperdiet nunc. Vestibulum elementum, velit id porttitor viverra, mi dolor suscipit eros, id consequat justo magna a ante. Maecenas varius eleifend nunc. Cras sed tortor. Phasellus dignissim, erat sit amet ullamcorper vehicula, lorem mi faucibus sapien, eget imperdiet ligula odio nec est. Vivamus venenatis, velit in interdum blandit, ligula mi pulvinar leo, at mollis purus magna fermentum mauris. Phasellus et purus. Suspendisse potenti. Aenean egestas consectetuer enim. Morbi elit justo, scelerisque lobortis, ornare ac, tincidunt eget, orci. Mauris faucibus ornare sem. Praesent nisi arcu, malesuada non, nonummy sit amet, sollicitudin ac, elit. Mauris nec libero id lectus porta tincidunt. Morbi purus est, gravida eget, cursus ut, ultricies quis, sapien.</p>
</body>
</html>

Which produces a very nice hanging indent (of 30px anyway).

kovidgoyal
08-17-2007, 02:24 AM
Yeah it's only <pre> that doesn't support links and that's because of a design decision I made when first writing html2lrf. Not supporting white-space made the code considerably simpler. In most html files it's more important to get the whitespace right than to support links in a <pre>.

As for the hanging indent, unfortunately SONY's LRF renderer doesn't support negative indents, AFAICT it treats them as zero.

EDIT: I was wrong, LRF can actually do hanging indents.

Excalibur
08-17-2007, 02:36 AM
white space in html is ignored UNLESS you use <pre> which supports links. Supporting links with or without whitespace shouldn't be any different. Now, if that white space is designated as non-wrapping whitespace such as when you use the &nbsp; entity, that *might* be an issue, though the reader should automagically wrap it using it's own algorithms.

What's the difference between keeping track of the white space and not with your algorithms?

kovidgoyal
08-17-2007, 02:52 AM
the problem is that in ordinary html multiple consecutive whitespace elements are collapsed into a single white space elemnt, while this is not true in LRF. So I have little checks that do this manually at multiple points in the code. I suppose I could hunt down every instance and wrap it in a if..else but that's a pain, and would require extensive debugging to make sure it doesn't break other behavior.

kovidgoyal
08-17-2007, 03:37 AM
Released v0.3.97 with support for the padding CCS attribute. This makes LaughingVulcan's hanging indents possible. See the new demo in the first post.

It'll reach the servers in ~30min.

Excalibur
08-17-2007, 08:29 AM
Can you use regular expressions in Python? That would eliminate the need for hunting down all the whitespace or wrapping an if-then around stuff. You'd just have to check to see if you were in a preformatted space such as with the white-space:pre css declaration or the <pre> tag. If you were, you ignored the collapsing of white space. If not, just change white space within tags to one (div, p, etc -- the block elements) using a regular expression search & replace...

Just an idea.

Good going with the addition of hanging indent! :)

kovidgoyal
08-17-2007, 09:18 AM
It isn't that simple, what about the situation

<p>one <span> two</span>

Excalibur
08-17-2007, 09:45 AM
What about the situation? That would produce two spaces, that's how it renders in Firefox and in IE. Though if you had:


<p>one <span> two</span>

which is 2 spaces between the <span> and two, it would reduce to 1 space.

HTML creates a tree of its elements (Which I'm sure you're aware of).

<p>
"one"
<span>
" two"
</span>

Since almost all browsers reduce extra whitespace down to 1 within any given tag, except those that are designated as white-space:pre or within <pre> tags, then it would be safe for you to eliminate all extra whitespace that exists within individual tags. Though, if that whitespace happens to already be 1 space in size then there's no reason to eliminate it.

That's how the browsers do it. If you wish to faithfully reproduce the same look as in the browsers, then do what they do. Instances where you want a single space between one and two above are going to drive you nuts. And I'd say, unless there is style information tied to a tag that you ignore the tag.

Since, in this case, unless there is a specific style attributed to the <span> tag, I'd simply ignore it since it's not doing anything.

kovidgoyal
08-17-2007, 10:10 AM
No that renders with one space. And obviously assume there is some style information associated with the tag, since that's the case I have to worry about.

Excalibur
08-17-2007, 10:20 AM
Well, I tried it in Firefox and it renders with two spaces. IE is wrong when it comes to rendering HTML properly. Anyway, I tried it in IE and it does incorrectly render with 1 space.

So, which do you go with? Standards compliant Firefox or non-standards compliant IE. Either way, you could just as easily remove any whitespace at the beginning of a non-pre tag and that will fix said issue. If the tag has CSS that says it is white-space:pre; then you can check that before removing the space at the beginning of a tag and determine if you need to remove it or not.

hmm I forgot what started this conversation anyway... heh

kovidgoyal
08-17-2007, 10:27 AM
firefox and konqueror in linux render it with one space. I suspect firefox on windows is broken. Certainly redering it with one space is logically correct. No you cant blindly remove leading space, consider the following situation


<p>One,<span> two


Just need to remember the last string added, shouldn't be too hard to do.

Excalibur
08-17-2007, 10:51 AM
Well, it "looks" like one space but it is 2 spaces in source... I guess it does collapse it in the browser window then...

True enough.

kovidgoyal
08-17-2007, 04:43 PM
Released v0.3.98 with support for white-space. This required a complete rewrite of html2lrf's whitespace handling mechanism, so keep an eye out for whitespace related problems. The new code is actually simpler than the old, but not tested nearly as much ;-)

Also fixed the lit2lrf, any2lrf bugs on windows.

Pode
08-17-2007, 06:11 PM
kovidgoyal>here's an attached .lrf, with an example of line-centered text.
It's an article from the New York Time, taken as an html page, and converted with html2lrf.
If you go on the last page, you can see a list of items, centered.
If I understood well, what you want (and others on the forum too) is centered text on the reader.

Centered text like that pops in nearly all my lrf files, because a lot, if not all of them comes from saved web pages, poorly stripped of their screen-waste content. So I suppose that when I get centered text, it's because, even if I cut "with an axe" through the html code, css definition still (badly) applies to what's left...

kovidgoyal
08-17-2007, 06:18 PM
If you go on the last page, you can see a list of items, centered.

That's horizontal centering. The O.P. was asking about vertical centering.

irishjew
08-17-2007, 06:20 PM
the <a href> tag is inside a <pre> tag. Delete the <pre> tag and you'll be fine.

Thanks a bunch, Kovid. I did that, and it fixed the links, but unfortunately now all my line breaks and tabs are gone. Do you know what might cause that?

kovidgoyal
08-17-2007, 06:25 PM
upgrade to the latest version and put the <pre> tags back and you should have both links and line breaks.

irishjew
08-17-2007, 07:45 PM
upgrade to the latest version and put the <pre> tags back and you should have both links and line breaks.

I upgraded, and the links are there, but they don't point to the right places. Instead, they all go back to the beginning of the file.

kovidgoyal
08-18-2007, 01:59 AM
I upgraded, and the links are there, but they don't point to the right places. Instead, they all go back to the beginning of the file.

Hmm that shouldn't be happening please send me the file.

Pode
08-18-2007, 06:01 AM
kovidgoyal>OK, it seems I misunderstood. Excuse-me for any false hope.

JSWolf
08-18-2007, 09:05 AM
So any idea when the next version is going to be released? The one that fixes html2lrf.

kovidgoyal
08-18-2007, 11:12 AM
Later today.

angelyne
08-18-2007, 12:58 PM
I'm having a small problem.

I have a sentence that is coded like this : <p class="first">G<em>ram called at five in the morning. She never remembered</em> which looks like this

Gram called at five in the morning. She never remembered the time difference.

I don't like the fact that the first letter is not italics. It looks funny. I also wanted the first letter to be a drop cap. So I changed the code to this :

<p class="first"><em><big class='libprs500_dropcaps'>G</big>ram called at five in the morning. She never remembered</em>

The result looks fine in a browser, but doesn't convert well. The result is the first letter is in italics and drop-capped, but the rest of the sentence looses the italics.

It looks like this

G ram called at five in the morning.

kovidgoyal
08-18-2007, 01:03 PM
I tried that. For me, the the entire statement is italicized, as would be expected from the HTML

<em><big class='libprs500_dropcaps'>G</big>ram called at five in the morning. She never remembered</em>

angelyne
08-18-2007, 10:20 PM
I made it work. Thank you so much for your help. And thank you making this great tool. Without dedicated people like you, we would have never been able to break out of the Sony Connect prison ...

tsgreer
08-18-2007, 10:31 PM
Holy crap! We are at version 3.99!! Does that mean the 4.0 GUI version is almost here?! :)

kovidgoyal
08-18-2007, 10:34 PM
Yeah there's basically only one thing left i have to implement.

angelyne
08-19-2007, 02:12 PM
May I bug you with yet another question.

Is there a way you could define a style that would take the first letter of a paragraph and make it a drop cap.

I've seen a couple of similar style that does this but it's not compatible with your converter

here is one:

p.drop { text-indent: 0em; margin: 0em; }
p.drop:first-letter { font-size : 165%; font-weight : bold; width : .50em; }

I've been using your <p><big class='libprs500_dropcaps'></big> but I don't know how to turn that into a style.

And one last question ... what would be the command to make text smaller (that is accepted by html2lrf).

kovidgoyal
08-19-2007, 02:14 PM
May I bug you with yet another question.

Is there a way you could define a style that would take the first letter of a paragraph and make it a drop cap.

I've seen a couple of similar style that does this but it's not compatible with your converter

here is one:

p.drop { text-indent: 0em; margin: 0em; }
p.drop:first-letter { font-size : 165%; font-weight : bold; width : .50em; }

Can you open a feature request at libprs500.kovidgoyal.net and if I get the time I'll do it.

angelyne
08-19-2007, 04:01 PM
Have done so. Thank you. It's ticket 167

JSWolf
08-20-2007, 12:02 AM
Yeah there's basically only one thing left i have to implement.
Make that TWO things that need to be implemented.

I just created ticket #168 because I have a LIT file that HTML2LRF hangs on big time. I've attached the LIT file also. And I did try version 0.3.101.

kovidgoyal
08-20-2007, 12:26 AM
It doesn't hang, it just takes a long time...about 20mins on my 3GHZ machine. That's because newer version of html2lrf optimize memory usage at the expense of running time, and that lit file has some seriously messy HTML.

JSWolf
08-20-2007, 01:12 AM
hhhmm... Ok, I'll try again and this time leave it running. Thanks!

LaughingVulcan
08-22-2007, 07:05 PM
Released v0.3.97 with support for the padding CCS attribute. This makes LaughingVulcan's hanging indents possible. See the new demo in the first post.

It'll reach the servers in ~30min.


Thank you, thank you, thank you! It works VERY nicely!! :)

kovidgoyal
08-22-2007, 07:12 PM
Thank you, thank you, thank you! It works VERY nicely!! :)

Anything for a fellow poetry enthusiast :-)

volwrath
08-23-2007, 02:50 PM
Anything for a fellow poetry enthusiast :-)

Quick question. Did you fix the problem where you have to install the USB drivers to get libprs500 to work?

kovidgoyal
08-23-2007, 02:58 PM
I think I did, though I haven't tested it.

volwrath
08-23-2007, 03:26 PM
I think I did, though I haven't tested it.

Hehe great! I will test it for you tonite :)

beartard
08-23-2007, 04:33 PM
I haven't used html2lrf in a while. I'm trying to help my friends at almudi.org to convert the breviary in Latin to BBeB format since they graciously loaned me the source files. I have a folder of html files (all linked together) to convert and I'm getting the following error back from the latest version as of this post:

[INFO] convert_from.py:187: Processing completorium.html
Parsing HTML...
[INFO] convert_from.py:205: Converting to BBeB...
Traceback (most recent call last):
File "/usr/bin/html2lrf", line 8, in <module>
load_entry_point('libprs500==0.3.103', 'console_scripts', 'html2lrf')()
File "build/bdist.linux-i686/egg/libprs500/ebooks/lrf/html/convert_from.py", line 1576, in main
File "build/bdist.linux-i686/egg/libprs500/ebooks/lrf/html/convert_from.py", line 1493, in process_file
File "build/bdist.linux-i686/egg/libprs500/ebooks/lrf/html/convert_from.py", line 177, in __init__
File "build/bdist.linux-i686/egg/libprs500/ebooks/lrf/html/convert_from.py", line 219, in start_on_file
File "build/bdist.linux-i686/egg/libprs500/ebooks/lrf/html/convert_from.py", line 322, in parse_file
File "build/bdist.linux-i686/egg/libprs500/ebooks/lrf/html/convert_from.py", line 528, in process_children
File "build/bdist.linux-i686/egg/libprs500/ebooks/lrf/html/convert_from.py", line 1376, in parse_tag
File "build/bdist.linux-i686/egg/libprs500/ebooks/lrf/html/convert_from.py", line 528, in process_children
File "build/bdist.linux-i686/egg/libprs500/ebooks/lrf/html/convert_from.py", line 1376, in parse_tag
File "build/bdist.linux-i686/egg/libprs500/ebooks/lrf/html/convert_from.py", line 528, in process_children
File "build/bdist.linux-i686/egg/libprs500/ebooks/lrf/html/convert_from.py", line 1209, in parse_tag
UnboundLocalError: local variable 'pcss' referenced before assignment


This is from the command (where completorium.html is the index file):

html2lrf -t "Liturgia Horarum: ad Completorium" -a "Ecclesia Catholica" --publisher="almudi.org" --header --headerformat=%t --link-levels=1 --disable-chapter-detection --verbose ./completorium.html

kovidgoyal
08-23-2007, 05:18 PM
Sigh another typo. I'll release a new version soon.

beartard
08-23-2007, 05:26 PM
Sigh another typo. I'll release a new version soon.

As long as it's *your* typo, I'm not complaining a bit :wink:

volwrath
08-23-2007, 09:06 PM
I think I did, though I haven't tested it.

Works well! Thanks for a great piece of software!

beowulf573
08-29-2007, 10:47 AM
A couple quick questions:

1) Is it possible to do footnotes using html2lrf? What's the best way to handle them?

2) Has anyone tried to come up with a style to mimic a title page? What's the best way to do vertical spacing that looks good at different font sizes.

thanks,
Eddie

beowulf573
08-31-2007, 08:30 AM
Well, I got links working in a footnote type style, but I have a formatting problem.

I can make an internal hyperlink that isn't superscript, or superscript that isn't an internal hyperlink. But I can't make a text block that's both. I've tried both the <sup> tag and the css style: text-align: supercript;

Any suggestions?

kovidgoyal
08-31-2007, 03:31 PM
yeah html2lrf doesn't support links in sup/sub for technical reasons.

beowulf573
08-31-2007, 04:03 PM
Ah, ok, thanks. I used lrf2lrs to try and figure out what was going on and figured it was just a limitation of the format.

beartard
09-08-2007, 12:02 PM
I have an interesting issue with html2lrf. I can't seem to put a meta-tag in the lrf file that has exclamation points at the end. For example when I use
-t "This book sucks!"
I get a response that
bash event: !" not found.
Am I doing something wrong?

kovidgoyal
09-08-2007, 12:11 PM
Use -t 'title!'

i.e. single quotes

beartard
09-08-2007, 09:34 PM
I knew I was being stupid somehow ;-)

silkcom
09-08-2007, 11:18 PM
Is there a way to go from PDF to html in such a way that I can use the HTML converter that won't thrash the formatting and images? Generally with PDF converting I see that either the whole thing is made into images (large file size), or the formatting is completely gone. I'm very interested in finding a way to keep "code" fonts and images inside the code, but get larger font sizes than PDF's give on the 6 inch screen.

I like to read programming books (which always have images and code examples).

kovidgoyal
09-08-2007, 11:28 PM
Nope there is no way to do that. PDF is a partially rasterized format. As far as I know it isn't possible to convert PDFs with high fidelity. Rasterization is the only way to go for complex PDFs.

angelyne
09-15-2007, 01:09 PM
I find that inserting blank lines in the form of <br> are ignored by the converter. What code, if any, could I use to actually create a blank line.

lj69
09-15-2007, 01:16 PM
can anyone can tell me how to do this, i try drag and drop the file from libpsr500 to desktop but got the message error can not copy from disk or soure or something
pls any one write tut with pics, step by step about this , i'm a newbie with this

kovidgoyal
09-15-2007, 01:21 PM
I find that inserting blank lines in the form of <br> are ignored by the converter. What code, if any, could I use to actually create a blank line.

Interesting, that shouldn't happen can you post a code snippet that demonstrates this. You can use empty paragraphs <p></p>

EDIT: Also note that a single <br> causes a line break, not a blank line. For a blank line you need <br><br>

angelyne
09-15-2007, 07:38 PM
I did a few test and and got mixed results. However, the reason I wasn't achieving the result I wanted is that
blank spaces will be ignored before the text but will be used after the txt start. Here is the code

<body>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>Title</p>
<p>Author</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>text (with blank lines)</p>
</body>

Giving something like this
<top of page>
Title
Author



Text

angelyne
09-15-2007, 07:46 PM
Here is another question. I noticed that in a book I bought from Sony Connect : (Anansi Boys), the table of content (in the table of contents section) displays 4 items. Copyright, TOC, About, Begin.

However the TOC that is part of the text itself (which are simply html links) is quite elaborate, with individual chapters listed.

I find that I like this.

Is there a way we can code HTML so that we can pick and choose which elements html2lrf will include in the Table of Contents section of the book?

JSWolf
09-16-2007, 08:03 PM
Here is another question. I noticed that in a book I bought from Sony Connect : (Anansi Boys), the table of content (in the table of contents section) displays 4 items. Copyright, TOC, About, Begin.

However the TOC that is part of the text itself (which are simply html links) is quite elaborate, with individual chapters listed.

I find that I like this.

Is there a way we can code HTML so that we can pick and choose which elements html2lrf will include in the Table of Contents section of the book?
I know you can do it in Book Designer. But I've never seen or read about a way to do this with html2lrf

kovidgoyal
09-16-2007, 08:10 PM
I know you can do it in Book Designer. But I've never seen or read about a way to do this with html2lrf

Split the HTML file into a toc file and a main file. ut the links you want to go into the TOC in the toc file and the rest in the main file. Run html2lrf on the toc file.

JSWolf
09-16-2007, 08:15 PM
Split the HTML file into a toc file and a main file. ut the links you want to go into the TOC in the toc file and the rest in the main file. Run html2lrf on the toc file.
Which ToC will show up in the Sony ToC (aka the menu) and which will only show up in the book?

kovidgoyal
09-16-2007, 08:19 PM
toc -> TOC main -> file

kovidgoyal
09-17-2007, 11:26 AM
I did a few test and and got mixed results. However, the reason I wasn't achieving the result I wanted is that
blank spaces will be ignored before the text but will be used after the txt start. Here is the code

<body>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>Title</p>
<p>Author</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>text (with blank lines)</p>
</body>

Giving something like this
<top of page>
Title
Author



Text

This will be fixed sometime after 0.4.0.

silkcom
09-19-2007, 04:08 PM
So I just got a piece of software called Solid Converter PDF (sorry it's a windows product, and it's proprietary). However, it's PDF to MS Word conversion is simply amazon. It brings over everything (and I mean everything). The only problem that I've seen, or the only thing that isn't exact is that the bullets used are squares in the PDF and dashes in the word doc. (That's good enough).

This means that it is possible to do a PDF to HTML that is solid, keeps all the formatting, and then can be formatted down to LRF to look "just like" the original.

kovidgoyal
09-19-2007, 04:18 PM
Is the word doc reflowable? export it to html and see what happens when you resize the window it is viewed in.

silkcom
09-19-2007, 04:39 PM
I can highlight text and all that. I converted it to RTF, asside from loosing some text background colors (which a somewhat important but I can live without them), the conversion was definately amazing, and totally readable.

Doc's html converter isn't the best, it lost a bunch of the formatting, but the doc to rtf worked great.

Fallen angel
09-23-2007, 11:59 AM
I'm not sure if this is the right place to say that, but I cannot have the libprs500 to work. I'm having an error message. That's what is written in the log file: Traceback (most recent call last):
File "main.py", line 696, in <module>
File "main.py", line 688, in main
File "main.py", line 132, in __init__
File "libprs500\gui2\library.pyo", line 367, in set_database
File "libprs500\gui2\library.pyo", line 102, in set_database
UnicodeEncodeError: 'ascii' codec can't encode characters in position 26-32: ordinal not in range(128)

kovidgoyal
09-23-2007, 12:21 PM
Will be fixed in the next version.

balok
09-24-2007, 09:21 AM
Hi kovidgoyal,

Even with --bottom-margin=0 (which is the default, incidentally), there still seems to be quite a gap at the bottom, which doesn't happen for left/top/right. Is this normal, or am I missing something?

Btw, thanks for this great program.

kovidgoyal
09-24-2007, 10:44 AM
That's a safety feature to catch any "overflow" from formatting errors.

balok
09-24-2007, 11:31 AM
That's a safety feature to catch any "overflow" from formatting errors.

Is there a way to turn off this feature?

kovidgoyal
09-24-2007, 12:18 PM
In the next release there will be a prs500-unsafe profile.

balok
09-24-2007, 07:55 PM
In the next release there will be a prs500-unsafe profile.

Thanks. I assume the safety feature also accounts for the few pixels left on the side margins, and the small space on the top margin also? Will the prs500-unsafe profile cover these too?

kovidgoyal
09-24-2007, 08:10 PM
No the profile doesn't set the top and side margins. There are non-zero defaults for them, setting them to zero is the best that can be done.

balok
09-29-2007, 07:18 AM
Hi Kovid,

I haven't noticed any difference when using the prs500-unsafe profile. I've made two lrf files, one with the prs500 profile and the other with the unsafe profile, both are exactly the same.

Fallen angel
09-30-2007, 05:18 AM
The new version (0.4.5) gives me the same error.

kovidgoyal
09-30-2007, 12:33 PM
Oops looks like I forgot to fix that error. Sorry, you'll have to wait for the next version.

kovidgoyal
09-30-2007, 12:47 PM
Hi Kovid,

I haven't noticed any difference when using the prs500-unsafe profile. I've made two lrf files, one with the prs500 profile and the other with the unsafe profile, both are exactly the same.

Hmm 'fraid I dont know what else to do that seems to be the best that can be done.

Fallen angel
10-02-2007, 06:30 AM
I don't know what's wrong with my pc, but v.0.4.7 doesn't work either. :smack:

kovidgoyal
10-02-2007, 11:43 AM
Since I did make some changes can you repost the error message.

Fallen angel
10-03-2007, 07:40 AM
Traceback (most recent call last):
File "main.py", line 723, in <module>
File "main.py", line 716, in main
File "main.py", line 137, in __init__
File "libprs500\gui2\library.pyo", line 383, in set_database
File "libprs500\gui2\library.pyo", line 118, in set_database
File "libprs500\library\database.pyo", line 624, in __init__
File "libprs500\library\database.pyo", line 39, in _connect
UnicodeEncodeError: 'ascii' codec can't encode characters in position 26-32: ordinal not in range(128)

aaronvegh
10-03-2007, 09:44 AM
Hi there,
I'm attaching a copy of Apple's Cocoa Fundamentals development guide, a book in HTML format (a PDF is available as well, but I'd prefer to get this one working on my PRS500). I've run other files through html2lrf without problems, but this set is causing all kinds of problems!

html2lrf /Developer/ADC\ Reference\ Library/documentation/Cocoa/Conceptual/CocoaFundamentalIntroduction/chapter_1_section_1.html
Processing chapter_1_section_1.html
Parsing HTML...
Converting to BBeB...
Processing index.html
Parsing HTML...
Converting to BBeB...
An error occurred while processing a table: list index out of range. Ignoring table markup.
Processing index-date.html
Parsing HTML...
Converting to BBeB...
An error occurred while processing a table: list index out of range. Ignoring table markup.
An error occurred while processing a table: Table has cell that is too large. Ignoring table markup.
An error occurred while processing a table: Table has cell that is too large. Ignoring table markup.
Processing index-date0.html
Parsing HTML...
Converting to BBeB...
An error occurred while processing a table: list index out of range. Ignoring table markup.
^CTraceback (most recent call last):
File "/Applications/libprs500.app/Contents/Resources/html2lrf.py", line 9, in <module>
main()
File "libprs500/ebooks/lrf/html/convert_from.pyo", line 1590, in main
File "libprs500/ebooks/lrf/html/convert_from.pyo", line 1507, in process_file
File "libprs500/ebooks/lrf/html/convert_from.pyo", line 179, in __init__
File "libprs500/ebooks/lrf/html/convert_from.pyo", line 231, in start_on_file
File "libprs500/ebooks/lrf/html/convert_from.pyo", line 469, in process_links
File "libprs500/ebooks/lrf/html/convert_from.pyo", line 231, in start_on_file
File "libprs500/ebooks/lrf/html/convert_from.pyo", line 469, in process_links
File "libprs500/ebooks/lrf/html/convert_from.pyo", line 231, in start_on_file
File "libprs500/ebooks/lrf/html/convert_from.pyo", line 469, in process_links
File "libprs500/ebooks/lrf/html/convert_from.pyo", line 229, in start_on_file
File "libprs500/ebooks/lrf/html/convert_from.pyo", line 332, in parse_file
File "libprs500/ebooks/lrf/html/convert_from.pyo", line 538, in process_children
File "libprs500/ebooks/lrf/html/convert_from.pyo", line 1390, in parse_tag
File "libprs500/ebooks/lrf/html/convert_from.pyo", line 538, in process_children
File "libprs500/ebooks/lrf/html/convert_from.pyo", line 1390, in parse_tag
File "libprs500/ebooks/lrf/html/convert_from.pyo", line 538, in process_children
File "libprs500/ebooks/lrf/html/convert_from.pyo", line 1386, in parse_tag
File "libprs500/ebooks/lrf/html/convert_from.pyo", line 538, in process_children
File "libprs500/ebooks/lrf/html/convert_from.pyo", line 1371, in parse_tag
File "libprs500/ebooks/lrf/html/convert_from.pyo", line 538, in process_children
File "libprs500/ebooks/lrf/html/convert_from.pyo", line 1376, in parse_tag
File "libprs500/ebooks/lrf/html/convert_from.pyo", line 538, in process_children
File "libprs500/ebooks/lrf/html/convert_from.pyo", line 1380, in parse_tag
File "libprs500/ebooks/lrf/html/convert_from.pyo", line 1401, in process_table
File "libprs500/ebooks/lrf/html/table.pyo", line 371, in blocks
File "libprs500/ebooks/lrf/html/table.pyo", line 326, in get_widths
File "libprs500/ebooks/lrf/html/table.pyo", line 267, in preferred_width
File "libprs500/ebooks/lrf/html/table.pyo", line 212, in preferred_width
File "libprs500/ebooks/lrf/html/table.pyo", line 209, in text_block_preferred_width
File "libprs500/ebooks/lrf/html/table.pyo", line 163, in text_block_size
KeyboardInterrupt

The process seems to hang after spitting out a few error messages, and then throws these errors after I do an interrupt (Ctrl-C). Any suggestions on how to proceed?

Thanks,
Aaron.

kovidgoyal
10-03-2007, 11:37 AM
Traceback (most recent call last):
File "main.py", line 723, in <module>
File "main.py", line 716, in main
File "main.py", line 137, in __init__
File "libprs500\gui2\library.pyo", line 383, in set_database
File "libprs500\gui2\library.pyo", line 118, in set_database
File "libprs500\library\database.pyo", line 624, in __init__
File "libprs500\library\database.pyo", line 39, in _connect
UnicodeEncodeError: 'ascii' codec can't encode characters in position 26-32: ordinal not in range(128)

Hmm ok I'm trying another fix in the next version.

kovidgoyal
10-03-2007, 11:42 AM
Hi there,
I'm attaching a copy of Apple's Cocoa Fundamentals development guide, a book in HTML format (a PDF is available as well, but I'd prefer to get this one working on my PRS500). I've run other files through html2lrf without problems, but this set is causing all kinds of problems!

The process seems to hang after spitting out a few error messages, and then throws these errors after I do an interrupt (Ctrl-C). Any suggestions on how to proceed?

Thanks,
Aaron.

The file it seems to be hanging on index-date0.html was not in the attached zip.

aaronvegh
10-03-2007, 03:15 PM
The file it seems to be hanging on index-date0.html was not in the attached zip.

Yeah, I noticed that! I don't know where that file is coming from. I started with the original index.html, and I guess your tool goes through parsing for <a> tags? There's no index-date00.html referenced in the file. That's why I included the complete file set, hoping you could replicate the problem.

kovidgoyal
10-03-2007, 03:32 PM
unzip that zip file into some non-standard location (like your home directory) and then run html2lrf on toc.html. That worked for me.

aaronvegh
10-03-2007, 04:22 PM
Oh, that's it! Awesome! Great tool, thanks for your help!

Fallen angel
10-04-2007, 07:13 AM
Hmm ok I'm trying another fix in the next version.

Thank you very much!

maggotb0y
10-04-2007, 01:35 PM
Hey, I've been using Book Designer for a while to create Sony Reader Content, and I think it's a nice program, but it looks like I'll get better results with html2lrf for inline graphics and tables, etc. I'd like to be able to convert my HTML0 files output by BD using libprs500, and I'm having a few problems. I've looked around and was a bit surprised that this doesn't seem to have come up on the forum.

Here are my questions.

I can't figure out a chapter-regex that will pick up BD style chapter tags (I've included an example below). Can anyone help me with that?

<SPAN id=title><DIV align=center><B><FONT color=#001950>PROLOGUE</FONT></B></DIV>
</SPAN>

Another issue is trying to get page breaks to work the way I want. BD replaces the <HR> tag with page breaks, and I use that to control pagination (BD by default also does chapter page breaks, but I turn that off so that I can have more control over the output). With libprs500, I can use page-break-before-tag=HR and that controls the page break fine, but it displays the Horizontal Rule, which I don't want. Is there any workaround that would create the manual page break, but not include the visible line?

Finally (for now at least), BD creates empty lines as
<DIV align=justify>&nbsp;&nbsp;&nbsp;&nbsp;</DIV>
and HTML2LRF seems to not respect that empty line- the line below is pushed up. Is there any way I can force this to show as an empty line.

kovidgoyal
10-04-2007, 01:45 PM
Hmm none of those issues have easy solutions. What I will do is make a preprocessing option for BD that will automatically replace the problematic html with HTML that html2lrf processes. Can you send me a couple of example HTML0 files.

Also you may try saving the HTML0 files as HTML in BD and then running html2lrf over them.

maggotb0y
10-04-2007, 03:10 PM
Hmm none of those issues have easy solutions. What I will do is make a preprocessing option for BD that will automatically replace the problematic html with HTML that html2lrf processes. Can you send me a couple of example HTML0 files.

Also you may try saving the HTML0 files as HTML in BD and then running html2lrf over them.

A pre-processing engine for BD would probably make quite a bit of sense. The only difference between the "html0" file and the "save as HTML" options in BD is the name of the extension <g>.

I've attached a zip file with an HTML file generated from BD that will give you the basics. I'm sure it is pretty self explanatory, but if you have any questions, let me know. If you need some bigger files for a more complete test, let me know and I will see if I have anything public domain with some good formatting to post.

kovidgoyal
10-04-2007, 07:30 PM
Hmm tell me if the following mapping is correct
<span id=title> --> <h1>
<span id=subtitle> --> <h2>
What about lower levels of headings? subsubtitle, etc?
That way you can match chapters on either h1 or h2.

<hr> --> <span style="page-break-after:always" />

I will probably modify html2lrf's handling of the div tag to take care of the blank lines.

Incidentally, the way BD uses the id attribute is puzzling. According to the HTML spec ids should be unique, is there some reason vvv chose to use the id attribute rather than the class attribute, which is a much more natural fit?

maggotb0y
10-04-2007, 09:22 PM
Hmm tell me if the following mapping is correct
<span id=title> --> <h1>
<span id=subtitle> --> <h2>
What about lower levels of headings? subsubtitle, etc?
That way you can match chapters on either h1 or h2.
Looks almost right- book designer always creates a chapter for <span id=title> and never creates one for <span id=subtitle>. I use this to create a title for (e.g. "A funny thing happened today") but not for the subtitle ("chapter one") so that I have control over what appears in the table of contents. I doubt you would have to do much special handling for span id=subtitle, they will probably take care of themselves just fine.

Incidentally, the way BD uses the id attribute is puzzling. According to the HTML spec ids should be unique, is there some reason vvv chose to use the id attribute rather than the class attribute, which is a much more natural fit?
Not really sure why BD is the way it is <g>. It's a nice enough tool for creating ebooks, but it has it's quirks, and the pace of development and featureset of libprs500 seems to far outstrip BD (which is why I would like to make the switch).

Thanks a lot for your work on this, I'm sure you'll earn quite a few converts. Let me know when you have something ready for testing- I'm excited to start running my collection through this!

JSWolf
10-04-2007, 10:17 PM
If anyone wants to donate to get kividgoyal a new 505 so he can port libprs500 to it then please have a look at http://www.mobileread.com/forums/showthread.php?t=14496 and give give give.

glenn69
10-04-2007, 11:16 PM
I've been trying for a couple weeks to get a website page to lrf format using libprs500 in Linux (ubuntu). The page is simply a text handbook, so I thought it would be simple.

Well it hasn't really worked for me yet. i've tried saving html converting to pdf then pdf2lrf, no go. Tried html2lrf, again no good.

Could somebody convert show me how to convert this page :
http://www.gentoo.org/doc/en/handbook/handbook-x86.xml?style=printable&full=1#book_part1

Then let me know how you did it.

Thanks,

One Confused Reader

kovidgoyal
10-05-2007, 07:17 AM
When you say no good, what do you mean exactly? The simplest way to convert an online website into an LRF file is

web2lrf --url "http://www.gentoo.org/doc/en/handboo...l=1#book_part1"

EDIT: In your case the correct commandline is web2lrf --url "http://www.gentoo.org/doc/en/handbook/handbook-x86.xml?style=printable&full=1#book_part1" -r 0

glenn69
10-05-2007, 04:10 PM
I'm sorry for the lack of detail with "no good," but I've tried so many times that I've forgotten exact problems.

I ran the command exactly as you suggested
web2lrf --url "http://www.gentoo.org/doc/en/handbook/handbook-x86.xml?style=printable&full=1#book_part1" -r 0

When I try to view in Sony reader, upon opening the file, it either hangs on "Formatting..." or it reboots the unit.

So,I tried to use lrfviewer to view the created lrf. Below are my errors upon viewing:

glenn@glenn-desktop:~$ lrfviewer /home/glenn/GentooDocNoTable.lrf
glenn@glenn-desktop:~$ Layout time: 8.56640505791 seconds
Traceback (most recent call last):
File "build/bdist.linux-i686/egg/libprs500/gui2/lrf_renderer/main.py", line 180, in go_to_page
File "build/bdist.linux-i686/egg/libprs500/gui2/lrf_renderer/document.py", line 509, in show_page
TypeError: argument 0 of signal QGraphicsScene.page_changed(PyQt_PyObject) has an invalid type

Traceback (most recent call last):
File "build/bdist.linux-i686/egg/libprs500/gui2/lrf_renderer/main.py", line 180, in go_to_page
File "build/bdist.linux-i686/egg/libprs500/gui2/lrf_renderer/document.py", line 509, in show_page
TypeError: argument 0 of signal QGraphicsScene.page_changed(PyQt_PyObject) has an invalid type

Traceback (most recent call last):
File "build/bdist.linux-i686/egg/libprs500/gui2/lrf_renderer/main.py", line 180, in go_to_page
File "build/bdist.linux-i686/egg/libprs500/gui2/lrf_renderer/document.py", line 509, in show_page
TypeError: argument 0 of signal QGraphicsScene.page_changed(PyQt_PyObject) has an invalid type


Thanks for help and all your efforts on this great project.

kovidgoyal
10-05-2007, 07:32 PM
This is because the PyQt on your distribution is not correctly installed. The LRF file should work fine if you copy it to the SONY reader or view it using the Connect software.

kovidgoyal
10-05-2007, 10:32 PM
Looks almost right- book designer always creates a chapter for <span id=title> and never creates one for <span id=subtitle>. I use this to create a title for (e.g. "A funny thing happened today") but not for the subtitle ("chapter one") so that I have control over what appears in the table of contents. I doubt you would have to do much special handling for span id=subtitle, they will probably take care of themselves just fine.


Not really sure why BD is the way it is <g>. It's a nice enough tool for creating ebooks, but it has it's quirks, and the pace of development and featureset of libprs500 seems to far outstrip BD (which is why I would like to make the switch).

Thanks a lot for your work on this, I'm sure you'll earn quite a few converts. Let me know when you have something ready for testing- I'm excited to start running my collection through this!

Added initial support for pre-processing via the --book-designer option in svn.

glenn69
10-05-2007, 11:03 PM
According to Synaptic I have the following pyqt's installed:

Python-qt3
Python-qt4
Python-qt4-dev
Python-qt4-gl
Python-qt4-sql

Is there something else I need ?

kovidgoyal
10-05-2007, 11:12 PM
It's a version problem. You should either install SPI and PyQt by hand or wait for gutsy.

JSWolf
10-05-2007, 11:35 PM
Kewl! So it'll process uncompressed Book Designer files. Nice one! Just have to wait for the Windows version to be released.

danx
10-06-2007, 02:17 AM
Kovid,

You're superman--what you did is amazing.

I had problems upgrading Python to 2.5. I have 2.4 and I don't want to upgrade the Linux distribution now. It was a pain to upgrade and I posted a comment here on the other software besides Python that is needed by html2lrf. See libprs500 bug #117 for details (https://libprs500.kovidgoyal.net/ticket/117).

One question--I believe Sony LRF supports images, but unless I'm doing something wrong, html2lrf doesn't seem to include images. If my assumptions are true, are there any plans to include images in the generated LRF format? I viewed my generated LRF file and it had no images. I tried both relative and full paths for the image file. For example:
Relative: <img src="images/foo.jpg" />
Full path: <img src="http://www.foobar.com/images/foo.jpg" />

I donated to the Kovid PRS 505 fund--great work! (http://kovidgoyal.chipin.com/kovidgoyal-to-port-libprs500-to-the-prs505)

maggotb0y
10-06-2007, 07:16 AM
Added initial support for pre-processing via the --book-designer option in svn.

Okay, you've got me downloading your VMWare image now so I can try to play around with this- I'm sure I'm jumping in over my head, but I can't help it.

Thanks!

kovidgoyal
10-06-2007, 08:35 AM
Kovid,

You're superman--what you did is amazing.

I had problems upgrading Python to 2.5. I have 2.4 and I don't want to upgrade the Linux distribution now. It was a pain to upgrade and I posted a comment here on the other software besides Python that is needed by html2lrf. See libprs500 bug #117 for details (https://libprs500.kovidgoyal.net/ticket/117).

One question--I believe Sony LRF supports images, but unless I'm doing something wrong, html2lrf doesn't seem to include images. If my assumptions are true, are there any plans to include images in the generated LRF format? I viewed my generated LRF file and it had no images. I tried both relative and full paths for the image file. For example:
Relative: <img src="images/foo.jpg" />
Full path: <img src="http://www.foobar.com/images/foo.jpg" />

I donated to the Kovid PRS 505 fund--great work! (http://kovidgoyal.chipin.com/kovidgoyal-to-port-libprs500-to-the-prs505)

html2lrf does support images. See for example the demo lrf file attached in the first post of this thread. Probably there is something missing in your installation of the python imaging library. Try running html2lrf with the --verbose flag.

Thanks for the donation, its much appreciated.

kovidgoyal
10-06-2007, 08:35 AM
Okay, you've got me downloading your VMWare image now so I can try to play around with this- I'm sure I'm jumping in over my head, but I can't help it.

Thanks!

Not a problem. If you have trouble using the vmware image let me know.

danx
10-06-2007, 03:58 PM
html2lrf does support images. See for example the demo lrf file attached in the first post of this thread. Probably there is something missing in your installation of the python imaging library. Try running html2lrf with the --verbose flag.

I think I know the problem--the images are inside tables (for positioning and captions) and it appears the table contents are ignored because it's "too large" (an image):

html2lrf --verbose -o matthes.lrf -t "Francois Matthes and the Marks of Time" -a "Francois E. Matthes" matthes.html

Processing matthes.html
Parsing HTML...
Converting to BBeB...
An error occurred while processing a table: Table has cell that is too large. Ignoring table markup.
Output written to /local/apache/htdocs/yosemite/library/matthes/matthes.lrf

html2lrf --version
libprs500 0.4.7

Here's a typical table with an image:

<table border="0" align="right"><tr>
<td align="center">
<a href="images/dust_jacket.jpg">
<img src="images/thumbnail/dust_jacket.jpg"
alt="François Matthes and the Marks of Time, dust jacket"
width="576" height="411" border="0" /></a>
<br />
<a href="images/dust_jacket.jpg">
<span class="small"><i>Dust jacket</i></span></a>
</td>
</tr></table>

kovidgoyal
10-06-2007, 04:13 PM
Just delete the <a href="dust_jacket.jpeg"> line. There's no way for html2lrf to know what size the image referenced in an <a> element should be.

chatlumo
10-06-2007, 10:15 PM
About images (not included), this doesn't work with the GUI under Mac OS X but it's working if i use the command line. Do you know why ?

Thanks.

kovidgoyal
10-06-2007, 10:34 PM
You have to zip up the HTML file and all its images and then add it to the database. Only then will conversion from the GUI include the images.

glenn69
10-06-2007, 11:57 PM
It's a version problem. You should either install SPI and PyQt by hand or wait for gutsy.

OK, I now have Python 2.5 and the html referenced by web2lrf --url "http://www.gentoo.org/doc/en/handbook/handbook-x86.xml?style=printable&full=1#book_part1" -r 0
seems to create an lrf without the errors I used to get. I have copied it to the reader, but when I choose the book to read the reader stays on "Formatting..." screen for about 5 minutes then when I read the book page changes take about 12 second each. Are these simply side effects of the translation from HTML or should I be looking into html2lrf options?

Thanks again

P.S. I did donate for your prs505, but didn't get the cool announcement line in my post. I hope you got it. Let me know if it didn't go through.

kovidgoyal
10-07-2007, 12:01 AM
You should probably try some of the chapter options, to force more frequent page breaks.

tidixon
10-07-2007, 05:10 PM
Hey,
I am new to this forum and to all these tools. I installed libprs500 and tried to use web2lrf to get the economist. It took about 30 minutes and produced a 23MB lrf that could not be read on the reader. This was on linux. I tried it again on Windows with the same result. I can post error logs and stuff, but I think maybe they just haven't finished writing the profile? It is not listed in the graphical interface but is in the command line.
Thanks,
TIm.

kovidgoyal
10-07-2007, 05:33 PM
yeah economist isn't done.

maggotb0y
10-07-2007, 09:18 PM
Not a problem. If you have trouble using the vmware image let me know.

Okay, I have trouble <g>.

I got the VM image up and running and networked, which took a little bit of fooling around as I am not terribly familiar with gentoo.

Now I try to update from SVN and when running python setup.py develop as root I get this error:


Trying to setup udev rules...
Setting up bash completion... failed
Traceback (most recent call last):
File "/home/kovid/work/libprs500/src/libprs500/linux.py", line 79 in setup_completion
from libprs500.gui2.lrf_renderer.main import option_parser as lrfviewerop
File "/home/kovid/work/libprs500/src/libprs500/gui2/lrf_renderer/main.py", line 27 in <module>
from libprs500.gui.lrf_renderer.main_us import Ui_mainWindow
ImportError: No module named main_ui
Setting up desktop integration
QPixmap::scaled: Pixmap is a null pixmap
/bin/sh: xdg-icon-resource: command not found
You do not have the Portland Desktop Utilities installed, skipping installation of desktop integration


Is this something I can safely ignore? Is the VMware image out of alignment with the current requirements?

Thanks for your help,
Christopher

kovidgoyal
10-07-2007, 09:23 PM
You need to run make

make -C src/libprs500/gui2/

maggotb0y
10-08-2007, 12:15 PM
You need to run make

make -C src/libprs500/gui2/

As always, you know your stuff. I built the new SVN under the VM image and tested converting some documents and they're not quite right (quite possible I don't know the appropriate switches- the only one i used is --book-designer). Here's what I've noticed on the first round:

<SPAN ID=title> should become a chapter title, the chapter isn't being generated
images aren't included
<SPAN id=subtitle>should be in the document, but it isn't

So I've created another test document for you (this time with images) and included the document the LRF that book designer generates for your comparison.

kovidgoyal
10-08-2007, 02:04 PM
Bug in the BD regexp fixed in svn

maggotb0y
10-08-2007, 09:46 PM
Kovid-

The formatting is getting better now, here are some new observations (and one question)

1)images in the lrf generated from the test book i posted aren't centered
2) the pagebreak (<hr>) before the first chapter in the sample posted doesn't get picked up correctly by html2lrf
3) images in some other books aren't getting picked up right. In another document this image:
<DIV align=center><IMG hspace=0 src="chapter1.jpg" align=baseline border=0></DIV>
isn't picked up
4) Would you prefer I enter these types of comments as tickets on your trac page? and do you prefer 1 ticket per comment or one "round-up" ticket?

Thanks again as always.

kovidgoyal
10-08-2007, 09:58 PM
The image centering is simply because in html2lrf images are usually inline, so they aren't centred. I could add some code to the preprocessor to generate centered images.

Yeah, tickets help me keep track of things.

maggotb0y
10-08-2007, 10:22 PM
Jeez, you're fast.

Okay, from now on I will enter comments as tickets.

One thing I forgot to ask. Is it possible to have the first paragraph of every chapter use drop caps? I have looked around a bit, but I don't see that documented (seems like it is basically supposed to be in the atributes of the par in the HTML?) It would be a nice feature when converting from another format to be able to specify that as an option.

Also, in what file or group of files does the BD pre-processor lie? I've not used Python before, but I am a developer and I'm fairly comfortable with regex's so I may be able to offer more useful suggestions if I know where to look.

Also, how does drop caps handle an open quote or an open smart quote before the first letter?

Finally (for now), is there a guide to getting a python environment up and running under Windows that would be approprate for compiling libprs500? If not, I'll do my best to get things running and write a guide for others to follow.