Shiny New E-Book Gizmo: The Amazon Kindle


View Full Version : HTML2LRF


igorsk
10-31-2006, 06:52 PM
Just a little tool I made using DLLs from the Toolbar for LIBRIe. It's basically a command-line version of the toolbar.

Features:

Supports both fetching remote URLs and local files
Adds a TOC with a link to each page
Embeds pictures
Preserves (some of) formatting


Issues:
Links do not work (yet?)
Error messages (if any) are in Japanese.

Usage:
HTML2LRF [-t "Book title"] [-a "Book Author"] [-o output_filename] <url|path.html> [<url|path.html>...]
Defaults are "Test Book", "Unknown Author", and "test.lrf".

Hints:

Text for TOC entries is taken from the page's <TITLE> tag.
If you want to process local files, make sure you pass a full pathname, or a file:// URL.


Clicky (http://projects.mobileread.com/reader/users/igorsk/HTML2LRF-0.1.ZIP)

kovidgoyal
10-31-2006, 07:40 PM
cool I'm going to try to get this going under wine as soon as I get some time.

Slava
10-31-2006, 09:30 PM
Just a little tool I made using DLLs from the Toolbar for LIBRIe. It's basically a command-line version of the toolbar.

You rule, man :)

Quick question, if Russian fonts installed on the Reader, will russian characters be displayed in LRF?

geekraver
11-01-2006, 12:03 AM
Igor, maybe you should mention this in the Content subforum.

scotty1024
11-01-2006, 01:01 AM
I call this application BBeBook. It is far from perfect/complete in its understanding of LRF, but It is written 100% in Java and needs no dll's from Sony. It is Unicode clean.

It uses a couple public java libraries: xpp3 and jpedal.

It can re-flow HTML and PDF documents into LRF files. It has latent support for rasterizing PDF files into PNG's in the LRF (non-reflow).

It comes with a sample Creative Commons ebook to bind with it: Cory Doctorow's Someone Comes to Town, Someone Leaves Town.

I've GPL'd the tool (not Cory's book), have fun.

kovidgoyal
11-01-2006, 02:28 AM
Looks good scotty, though is there a command line interface? I find GUIs really awkward for these kinds of jobs.

sartori
11-01-2006, 07:16 AM
Scotty,

Not sure if you noticed the tool lrf2lrs on the librie yahoo group - you can use it to extract lrf's back to lrs source files. Sony didn't encrypt the sample books they provided on the reader - they are just plain lrfs. Using the lrf2lrs tool you can extract most of them (some don't work) and take a look at their construction. It may help with understanding some of the lrf format.

I think the ones that don't work have tags that the librie doesn't use.

Based on the headers it looks like Sony is creating their content using BookCreator.

Rob

porkupan
11-01-2006, 12:39 PM
Quick question, if Russian fonts installed on the Reader, will russian characters be displayed in LRF?
The answer to this question is: Yes, they will. Tested on Fictionbook HTML. I guess, Sony's library automatically converts the Cyrillic CP1251 into Unicode, which is nice.

scotty1024
11-02-2006, 04:37 AM
Looks good scotty, though is there a command line interface? I find GUIs really awkward for these kinds of jobs.

Yes, it has a command line interface.

It uses an XML file to supply the dublin core data for the book in command line mode. An example of which is included for the example book. The tool knows how to resize a front cover image into the thumbnail to embed into the LRF file as well (one less thing to create yourself.)

scotty1024
11-02-2006, 04:46 AM
Scotty,

Not sure if you noticed the tool lrf2lrs on the librie yahoo group - you can use it to extract lrf's back to lrs source files. Sony didn't encrypt the sample books they provided on the reader - they are just plain lrfs. Using the lrf2lrs tool you can extract most of them (some don't work) and take a look at their construction. It may help with understanding some of the lrf format.

I think the ones that don't work have tags that the librie doesn't use.

Based on the headers it looks like Sony is creating their content using BookCreator.

Rob
I'm not sure if you noticed but earlier versions of this tool were posted on the Yahoo group by myself. I've also done work on the LRF reverse engineering wiki.

I have my own Java LRF dumper as well.

I've spent much time pulling apart LRF files.

My focus with this tool is in producing re-flowable LRF files. Book Creator, Book Designer focus on page layout oriented books that are more awkward to magnify.

Unfortunately the Sony Reader I ordered opening day, and was only recently delivered, died after 2 hours of operation. It has yet to be replaced. I live in a "dead zone" of local availability, as I live far too close to geekraver to be able to get a local replacement. :D

porkupan
11-02-2006, 10:05 AM
Issues:
Links do not work (yet?)
Error messages (if any) are in Japanese.

Igor,

Actually, the links do appear to work. :rolleyes5 At least in the fictionbook's HTML files they do work just fine. I guess it may depend on whether the links are "absolute" (full URL) or just "local" (labels). The label links work, the full URLs probably don't. So in theory the file could be pre-parsed to replace the absolute links with randomly generated local ones.

I wonder if Sony provided a way to specify the font size of the book, the genre (right now hardcoded some Japanese word), the line spacing? The font appears kinda a bit small, the margins - kinda a bit too wide, the line spacing is OK, but could be a bit tighter. The book looks nice, though.

Thanks

igorsk
11-02-2006, 10:25 AM
The text styles are specified in the "DesignHorizontal.lrf" file. You could try decompiling it with lrf2lrs, adjusting things you need, and compiling back.
As for the genre, it is indeed hardcoded but you can change it with the EditLRFMeta (http://editlrfmeta.peterknowles.com/) tool.

porkupan
11-02-2006, 09:46 PM
For whatever reason the program crashes on some of the HTML files for me. Seems to depend on the size of the file. For instance, this one (http://fictionbook.ru/author/dostoevskiyi_fedor_mihayilovich/zapiski_iz_mertvogo_doma/dostoevskiyi_zapiski_iz_mertvogo_doma.html) crashes, but smaller ones don't. Not sure where the limit is, and if it really is the file size that kills it.

BTW, my attempts to decipher the DesignHorisontal.lrf with lrf2lrs were unsuccessful. The scripts unloads with the error message:
Traceback (most recent call last):
File "H:\boroda\HTML2LRF-0.1\bin\lrf2lrs.py", line 1506, in <module>
sys.exit(main(sys.argv[1:]))
File "H:\boroda\HTML2LRF-0.1\bin\lrf2lrs.py", line 1500, in main
out.write(h.toXml());
File "H:\boroda\HTML2LRF-0.1\bin\lrf2lrs.py", line 1480, in toXml
xml += o.toXml(self.objects)
File "H:\boroda\HTML2LRF-0.1\bin\lrf2lrs.py", line 498, in toXml
f = StringIO.StringIO(self.stream)
AttributeError: LRFHeader instance has no attribute 'stream'

scotty1024
11-02-2006, 10:56 PM
For whatever reason the program crashes on some of the HTML files for me. Seems to depend on the size of the file. For instance, this one (http://fictionbook.ru/author/dostoevskiyi_fedor_mihayilovich/zapiski_iz_mertvogo_doma/dostoevskiyi_zapiski_iz_mertvogo_doma.html) crashes, but smaller ones don't. Not sure where the limit is, and if it really is the file size that kills it.


I could never get the Librie to eat more than 16MB of LRF. Perhaps the library knows this and cuts you off.

FangornUK
11-03-2006, 07:25 AM
Scotty1024, perhaps I'm missing something obvious but how do you use your BBeBook software? I don't know much about Java.

porkupan
11-03-2006, 08:48 AM
I could never get the Librie to eat more than 16MB of LRF. Perhaps the library knows this and cuts you off.
Well, actually these HTML files are not even close to 16MB. More like 600KB. Very well structured HTML, autogenned from FB2... :(

In fact, the Sony Reader happily takes huge image-based LRFs, and has no problem displaying them. I've tried as large as 37MB, and they worked very well. Compared to the PDF viewer, the LRF viewer in the Reader has been debugged the hell out of, I guess.

igorsk
11-03-2006, 10:39 AM
BTW, my attempts to decipher the DesignHorisontal.lrf with lrf2lrs were unsuccessful. The scripts unloads with the error message:
Traceback (most recent call last):
File "H:\boroda\HTML2LRF-0.1\bin\lrf2lrs.py", line 1506, in <module>
sys.exit(main(sys.argv[1:]))
File "H:\boroda\HTML2LRF-0.1\bin\lrf2lrs.py", line 1500, in main
out.write(h.toXml());
File "H:\boroda\HTML2LRF-0.1\bin\lrf2lrs.py", line 1480, in toXml
xml += o.toXml(self.objects)
File "H:\boroda\HTML2LRF-0.1\bin\lrf2lrs.py", line 498, in toXml
f = StringIO.StringIO(self.stream)
AttributeError: LRFHeader instance has no attribute 'stream'
Here's an updated version which can decompile it.
Also updated lrs2lrf which sets the specific scramble key (0xFE00) checked by LRFCreator.dll.

FangornUK
11-06-2006, 11:26 AM
igorsk, nice work! Working well for me with the Gutenberg HTML books but some of the later ones (e.g. Etext 19694) crash HTML2LRF .
I've looked at the DesignHorizontal.lrs (decompiled with lrf2lrs) but haven't got a clue what it means. Is there anyway to force page breaks in the LRF from the HTML pages?

igorsk
11-06-2006, 04:58 PM
Is there anyway to force page breaks in the LRF from the HTML pages?
Best would be to split it into separate HTML files. This way you'll also get a bonus in the form of TOC entries.

ptully
11-08-2006, 10:33 AM
igorsk - Fantastic script! It's exactly what i've been searching for since i got the ebook. I've been writing perl scripts to scrap online mags/blogs all over the place and use your script to convert them, then do a daily sync to the reader. Seems to be working great. Thanks a bunch!!!

scotty1024 - been die'n to try out your prog, but it's been a while since i've dived into java. It looks like i'm missing some libs (when i executed the program). Do i need to download those 2 libs you mentioned in order to get it to work?

neilm2
11-09-2006, 09:37 PM
I call this application BBeBook. It is far from perfect/complete in its understanding of LRF, but It is written 100% in Java and needs no dll's from Sony. It is Unicode clean.

It uses a couple public java libraries: xpp3 and jpedal.

It can re-flow HTML and PDF documents into LRF files. It has latent support for rasterizing PDF files into PNG's in the LRF (non-reflow).

It comes with a sample Creative Commons ebook to bind with it: Cory Doctorow's Someone Comes to Town, Someone Leaves Town.

I've GPL'd the tool (not Cory's book), have fun.

Hi Scotty,

I'm curious to try your BBeBook program but I've got zero Java skills. A friend suggests you might have a batch file that compiles or starts the program. Or perhaps a readme file.

Do you?

Thanks!

-Neil

FangornUK
11-10-2006, 09:29 AM
igorsk, I've written some perl wrapper scripts to feed HTML books into your HTML2LRF, mostly for Gutenberg HTML books, the thread is here (http://www.mobileread.com/forums/showthread.php?t=8532)

Is there anyway to stop HTML2LRF from generating the front title page?

Also when running HTML2LRF you have to be in its directory for it to work otherwise it cannot find the DesignHorizontal.lrf file, noticed some code in the HTML2LRF.cpp to handle this but then later it is ignored and just looks in the current directory.

igorsk
11-10-2006, 10:28 AM
As far as I can see, addition of the title page is hardcoded in LBParser.dll. LRFCreator.dll gives more flexibility but you'd have to do a lot of manual work, like parsing the HTML...
And thanks for noticing the bug, I'll fix it.

diabloNL
11-17-2006, 01:32 PM
[QUOTE=igorsk]

Usage:
HTML2LRF [-t "Book title"] [-a "Book Author"] [-o output_filename] <url|path.html> [<url|path.html>...]
Defaults are "Test Book", "Unknown Author", and "test.lrf".


Can somebody please tell me how to format the command exactly if I have:

A book ready at: C:\Documents and Settings\Bobby\Desktop\Cell.html

The program is here: C:\Documents and Settings\Bobby\Desktop\HTML2LRF\bin\

Thanks!


I tried to do it but I get a runtime error:

igorsk
11-17-2006, 05:20 PM
Something like this should work:
c:
cd "C:\Documents and Settings\Bobby\Desktop\HTML2LRF\bin\"
HTML2LRF "C:\Documents and Settings\Bobby\Desktop\Cell.html"

diabloNL
11-17-2006, 05:50 PM
Something like this should work:
c:
cd "C:\Documents and Settings\Bobby\Desktop\HTML2LRF\bin\"
HTML2LRF "C:\Documents and Settings\Bobby\Desktop\Cell.html"


Thanks igorsk but I still get the runtime error. :( Will try it on another laptop tomorrow.

igorsk
11-17-2006, 06:12 PM
Then I guess it could be something to do with the HTML file... how big is it?

diabloNL
11-17-2006, 06:17 PM
It's 0.8Mb. I tried different files all around the same file size.

FangornUK
11-18-2006, 05:44 AM
It's 0.8Mb. I tried different files all around the same file size.

That's too big for HTMl2LRF to handle. Use my splitbook.pl here (http://www.mobileread.com/forums/showthread.php?t=8532) to split it on Chapters.

diabloNL
11-18-2006, 05:50 AM
Thanks FangornUK! I will give it a try. :)

EDIT: It works! Only thing is that I would like to adjust the margins and textsize but the DesignHorizontal.lrf/lrs doesn't make sense to me what I need to change.

Man, if only Sony would enable pictures in RTF. :(

William Moates
11-26-2006, 06:15 AM
scotty1024, I've got some questions about the BBeBook application you posted here earlier. I tried to compile it as a JAR on Mac OS X 10.3.9, and it's been an interesting journey. (I've been doing some research online, but haven't been able to find answers, so I'm tossing this to you.)

After learning how to build a Java Tool in Xcode 1.5, I tossed all your Java files into the project, then told it to build. After much experimentation, I learned it needed the following public Java libraries (which I added as JARs):
xpp3 version 1.1.4 http://www.extreme.indiana.edu/xgws/xsoap/xpp/mxp1/
JPedal version 2.40b15 http://www.jpedal.org
PDFBox version 0.7.1 http://www.pdfbox.org
I got it to build the JAR, but when I try to run it, I get this error:


Exception in thread "main" java.lang.NoClassDefFoundError: org/pdfbox/util/PDFStreamEngine
at java.lang.ClassLoader.defineClass0(Native Method)
at java.lang.ClassLoader.defineClass(ClassLoader.java :539)
at java.security.SecureClassLoader.defineClass(Secure ClassLoader.java:123)
at java.net.URLClassLoader.defineClass(URLClassLoader .java:251)
at java.net.URLClassLoader.access$100(URLClassLoader. java:55)
at java.net.URLClassLoader$1.run(URLClassLoader.java: 194)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.j ava:187)
at java.lang.ClassLoader.loadClass(ClassLoader.java:2 89)
at sun.misc.Launcher$AppClassLoader.loadClass(Launche r.java:274)
at java.lang.ClassLoader.loadClass(ClassLoader.java:2 35)
at java.lang.ClassLoader.loadClassInternal(ClassLoade r.java:302)

Executable “java” has exited with status 1.


In your comments in PageParser.java, you refer to PDFBox 0.7.1, so I felt it was the right version to use. Was it?

I checked the version of Java by going to Terminal and typing "java -version", and it returned this:
java version "1.4.2_09"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.4.2_09-233)
Java HotSpot(TM) Client VM (build 1.4.2-56, mixed mode)
In plainer English, I'm running J2SE 1.4.2.

I built it with different versions of the Java libraries because, when I used the most recent versions of all three, I got these errors:

BBeBook.java:341: cannot resolve symbol : method setPreferredSize (java.awt.Dimension)
BBeBook.java:1383: cannot resolve symbol : method enableHighQualityMode ()
BBeBook.java:1387: cannot resolve symbol : method setThumbnaiImageTransparency (boolean)

The first method, setPreferredSize(), is not in J2SE version 1.4.2. It is a method in JComponent, but not JFrame, which is how you use it.
The last two methods are in JPedal, but are not available in versions 3.00 or versions 2.80. They have been removed from PdfDecoder's methods. (JPedal no longer distributes any versions older than 2.80 on their website, but I had version 2.40 from your posting in the Yahoo! Librie group.)
When I comment out all three methods, compile the JAR, then run it, I get the same exception thrown as above.

If you need more info, let me know.

kovidgoyal
11-26-2006, 03:57 PM
Does it run without building it into a jar file?

grebki
12-22-2006, 07:34 PM
FYI, I have discovered (at least on my laptop...) that I must run the SONY Connect Reader before HTML2LRF will work -- if I try to run it without ever having run the Sony s/w after a re-boot, it will constantly crash. I don't know why -- I'm guessing that some system variable(s) has to be set or something??

-G

ptully
12-23-2006, 12:34 PM
Hi - Just got a new intel-based macbook pro. The only thing left to get over on it is HTML2LRF. I know i can get this working on parallels (windows on my mac), but would LOVE to run this natively (it's become one of those necessary programs for me). I just read your latest post saying that you must connect to sony first or else it will crash.

Has anyone compiled and run HTML2LRF on a mac (or on linux for that matter)? I've written several perl scripts to scrap some websites/blogs and automaticly download them to my SD card when i insert it and would love to keep that up without having to boot parallels. Ideas/thoughts (other than i should stick with windows ;-)?

-Pat-

kovidgoyal
12-23-2006, 12:47 PM
It should theoretically be possible to get it running with winelib. Unfortunately, I don't have the necessary experience with winelib programming.

unkilbeeg
12-26-2006, 11:58 PM
Hmmm. Frustrating. So far I've had no luck in finding a way to create LRF files.

HTML2LRF doesn't seem to work very well under wine, at least not here. It creates an LRF file that contains a title page, but no text. My test file is from a Baen HTML download. Admittedly Baen's HTML is U.G.L.Y. Even after considerable cleanup, however, the resulting LRF file is exactly the same. Empty except for the title page.
libGL error: open DRM failed (Operation not permitted)
libGL error: reverting to (slow) indirect rendering
libGL error: open DRM failed (Operation not permitted)
libGL error: reverting to (slow) indirect rendering
HTML2LRF v 0.1 (c) 2006 Igor Skochinsky
Writing to test.lrf...
0671878484___0.html
fixme:wininet:InternetSetOptionExW Flags 00000000 ignored
fixme:wininet:InternetSetOptionW Option INTERNET_OPTION_CONNECT_RETRIES: STUB
Finalizing... done.


After some modification I was able to get compilebook.pl to run, but the problem with HTML2LRF meant that I got a file with a title page and 53 blank pages. It may not be accidental that this particular book has 53 html files. :)

BBeBook looks promising, but it's not at all clear how to use it. I echo the request of several others on this thread -- please, someone who is a Java developer, can you tell us what is necessary to make this thing work? I say a Java *developer*, since all the documentation for all things Java seems to assume you are a Java developer (don't get me started on my Tomcat rant....)

kovidgoyal
01-02-2007, 11:07 AM
http://www.mobileread.com/forums/showthread.php?t=8575

scotty1024
01-02-2007, 02:56 PM
Sorry folks, I hadn't seen this thread in awhile and no one PM'd me about the issues...

The zip comes with compiled Java .class files, but they are compiled for Java 5 and will not work on a prior Java version such as 1.4.2.

The code works with PDFBox 7.1 or 7.2. I haven't tried it with 7.3.

I'm bundling up the xpp3 and jpedal into a zip and attaching them.

I'll have a look at the Python code and see what kind of command line some folks like. It should be possible to remove the dependency upon Python. :)

scotty1024
01-02-2007, 05:42 PM
Just as a status update, I'm assembling version 0.2 of BBeBook.

Implements more command line options (compatible with the Python shell).
Adds switch to rasterize PDF files (convert them into a "picture book").
Comes in a all-in-one JAR file that can be double clicked on Windows or Mac OS X which will run a GUI interface allowing book binding.
Comes with bash/WinXP scripts to compile it and run it from the command line.

scotty1024
01-03-2007, 03:20 PM
Here is the .2 version.

I had to break it into two files so I could upload it.

BBeBook-0.2.jar.tar.bz2 contains the java all-in-one JAR file, which can be double clicked like an .exe file to run the application in GUI mode.

BBeBook-0.2.tar.bz2 contains all of the source, 3rd party files and scripts required to build the JAR file.

Version 0.2 requires Java 5 (j2sdk 1.5.0) or later to run. It will not compile under j2sdk 1.4.2 or earlier.

The code has been slightly re-factored and optimized from .1 and the command line supports passing a -r or --raster option to render a PDF file into an LRF as pictures. The -t --title -a --author --thumbnail command line switches from the python shell have been added as well.

I'm still working on converting it to native Sony Reader, it still assumes the screen is 800x600.

Enjoy.

kovidgoyal
01-03-2007, 04:31 PM
Processing of commandline options seems a little buggy
For e.g:

java -jar /home/kovid/download/BBeBook-0.2.jar Te\ st.html
Error loading LRF file: te st.html message: te st.html (No such file or directory)
java.io.FileNotFoundException: te st.html (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:106)
at HtmlParser.readWholeFile(HtmlParser.java:80)
at HtmlParser.<init>(HtmlParser.java:70)
at BBeBook.parseHTML(BBeBook.java:1616)
at BBeBook.makeBookFromHtml(BBeBook.java:654)
at BBeBook.makeBook(BBeBook.java:600)
at BBeBook.main(BBeBook.java:292)


or

java -jar /home/kovid/download/BBeBook-0.2.jar -t 'test title' test.html
WARNING: Book Label not supplied, setting to 'UNKNOWN'.
Exception in thread "main" java.io.FileNotFoundException: (No such file or directory)
at java.io.FileOutputStream.open(Native Method)
at java.io.FileOutputStream.<init>(FileOutputStream.java:179)
at java.io.FileOutputStream.<init>(FileOutputStream.java:131)
at BBeBook.write(BBeBook.java:755)
at BBeBook.makeBookFromHtml(BBeBook.java:658)
at BBeBook.makeBook(BBeBook.java:600)
at BBeBook.main(BBeBook.java:292)


or

java -jar /home/kovid/download/BBeBook-0.2.jar test.html
Error loading LRF file: image.gif message: image.gif (No such file or directory)
java.io.FileNotFoundException: image.gif (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:106)
at BBeBook.readWholeFile(BBeBook.java:1091)
at BBeBook.writeBookToFile(BBeBook.java:965)
at BBeBook.makeBookFromHtml(BBeBook.java:657)
at BBeBook.makeBook(BBeBook.java:600)
at BBeBook.main(BBeBook.java:292)


I believe image.gif is the file that's supposed to have the thumbnail?

scotty1024
01-03-2007, 06:18 PM
Spaces in command line arguments are indeed problematical. I'll see if I can find any easy fix for the next version.

image.gif is the default icon. The next version will use a Java resource API to supply it. For now you need to have image.gif lying around in the current directory or explicitly specify one.

kovidgoyal
01-03-2007, 07:08 PM
You could use something like CLI
http://jakarta.apache.org/commons/cli/

Also setting the title on the command line doesn't seem to work

java -jar /home/kovid/download/BBeBook-0.2.jar -t'test title' test.html && lrf-meta test.html.lrf | grep Title
Title: test.html


Ditto for author

java -jar /home/kovid/download/BBeBook-0.2.jar -a'test author' test.html && lrf-meta test.html.lrf | grep Author
Author: Unknown Author

scotty1024
01-03-2007, 09:47 PM
You need a space between the switch and its argument, but not one in the argument. :)

java -jar /home/kovid/download/BBeBook-0.2.jar -a 'testauthor' test.html

kovidgoyal
01-03-2007, 11:24 PM
One final (I hope) problem with command line options. BBeBook lowercases the input filename when deriving the output filename, for e.g.

java -jar /home/kovid/download/BBeBook-0.2.jar Test.html && ls *.lrf
test.html.lrf


Is this intentional? Or will it be fixed in a future version. I'm asking all these questions because I'm planning to integrate BBeBook into libprs500 and I need a stable commandline interface.

Thanks,

Kovid.

EDIT: Here's a modified makeBook function that fixes the lowercasing

// Process command line arguments
for (int i=0; i < args.length; i++) {
String oarg = args[i].trim();
String arg = oarg.toLowerCase();
if ("-t".equals(arg) || "--title".equals(arg)) {
bookTitle.setLength(0);
bookTitle.append(args[i+1]);
} else if ("-a".equals(arg) || "--author".equals(arg)) {
bookAuthor.setLength(0);
bookAuthor.append(args[i+1]);
} else if ("-r".equals(arg) || "--raster".equals(arg)) {
rasterize = true;
} else if ("--thumbnail".equals(arg)) {
bookIconName.setLength(0);
bookIconName.append(args[i+1]);
} else {
if (arg.endsWith(".pdf")) {
makeBookFromPdf(oarg, rasterize);
break;
} else if (arg.endsWith(".html") || arg.endsWith(".htm")) {
makeBookFromHtml(oarg);
break;
} else if (arg.endsWith(".xml")) {
makeBookFromXmlConfig(oarg);
break;
}

}
}

kovidgoyal
01-04-2007, 02:30 AM
Another problem I'm afraid

java -jar ./BBeBook.jar Test.xml
Error parsing metadata: internal error in parseEpilog
org.xmlpull.v1.XmlPullParserException: internal error in parseEpilog
at org.xmlpull.mxp1.MXParser.parseEpilog(MXParser.jav a:1655)
at org.xmlpull.mxp1.MXParser.nextImpl(MXParser.java:1 393)
at org.xmlpull.mxp1.MXParser.next(MXParser.java:1093)
at BBeBook.parseConfig(BBeBook.java:729)
at BBeBook.makeBookFromXmlConfig(BBeBook.java:665)
at BBeBook.makeBook(BBeBook.java:604)
at BBeBook.main(BBeBook.java:292)
No <Output> in config file.

cat Test.xml
<File>/home/kovid/documents/ebooks/temp/Test.html</File>
<Output>/home/kovid/documents/ebooks/temp/Test.lrf</Output>
<Icon>/home/kovid/work/prs-500/libprs500/lrf/cover.jpg</Icon>
<Title>Test.html</Title>
<Author>Unknown</Author>
<Label>Test.html</Label>
<BookID>20e4e18dfae3d1b38df3bce24863b74d</BookID>


EDIT: Nevermind, it just needed to be wrapped in a document element

scotty1024
01-04-2007, 10:52 AM
Is this intentional? Or will it be fixed in a future version. I'm asking all these questions because I'm planning to integrate BBeBook into libprs500 and I need a stable commandline interface.


I'll be replacing the command line processor shortly.

On the Mac/Windows you can access Test.xml as test.xml. :)

kovidgoyal
01-04-2007, 11:02 AM
Thanks, that'll be good. You may want to use the GNU command line conventions.


-c arg
--long-command=arg


http://www.gnu.org/prep/standards/standards.html#Command_002dLine-Interfaces

kovidgoyal
01-04-2007, 01:52 PM
Are you planning to add support for non GIF thumbnails?

scotty1024
01-04-2007, 02:20 PM
I've decoded how to do non-GIF thumbnails but found they generally produced the smallest icon so the software resizes and converts most formats to GIF.

If you can't feed it a 1024x768 JPEG and have it convert that into a LRF GIF icon please let me know.

kovidgoyal
01-04-2007, 02:24 PM
Hmm if i feed it a JFIF (.jpg) file via the <Icon> directive, it sets the thumbnail_type bytes (2 bytes at 0x4e) to 0x14 (i.e. GIF) but the actual thumbnail data has a JFIF header and is identified as a JPEG by imagemagick.

scotty1024
01-04-2007, 02:28 PM
Thanks, that'll be good. You may want to use the GNU command line conventions.


-c arg
--long-command=arg


http://www.gnu.org/prep/standards/standards.html#Command_002dLine-Interfaces

I appreciate your enthusiasm and your taking the time on all this but yes, I do grok GNU command line processing. My personal work flow uses the ebook.xml path as I have an OEBF convertor as well. You've given me a lot of feedback on what you need, it is all reasonable and I will be addressing it shortly. :deal:

kovidgoyal
01-04-2007, 02:41 PM
Thanks for taking the time to address my concerns. I just thought you may not be used to the GNU conventions, since BBeBook seems to have been developed on a non GNU platform. At any rate there's no rush as with the patch to the command line processor, I can easily use the xml interface in my code.

scotty1024
01-04-2007, 03:28 PM
I've been GNU'ing for 21 years now. :)

kovidgoyal
01-04-2007, 03:45 PM
Cool 21 certainly beats my 10...sorry to be such a pest, but here's what seems to be a rather serious bug. The generated LRF file seems to be invalid (Gives an error in the Connect Reader).


java -jar ../BBeBook-0.2.jar leaves.xml

The generated leaves.lrf is different from the original included in the source tarball. The generated one gives an error in the Connect Reader the original does not. The files are attached. Essentially the difference between the files is that in the generated one the toc object id is 0x32 instead of 0x33 and a whole bunch of bytes that were different values in the original are set to 0x32 in the generated version.

If the original leaves.lrf was generated by BBeBook-0.2, I'm at a loss.

melchioe
01-11-2007, 01:25 PM
Hmmm. Frustrating. So far I've had no luck in finding a way to create LRF files.

HTML2LRF doesn't seem to work very well under wine, at least not here. It creates an LRF file that contains a title page, but no text. My test file is from a Baen HTML download. Admittedly Baen's HTML is U.G.L.Y. Even after considerable cleanup, however, the resulting LRF file is exactly the same. Empty except for the title page.


I'm operating under Windows and I get the "empty except for title page" issue as well, and I concur that it seems to add another blank page per html file.

I've run gutlrf with success (which calls HTML2LRF to produce the lrf file), but when I use GutenMark to convert text to HTML, the resultant file does
not come out well with HTML2LRF. I was convinced I was doing something wrong, but I'm no longer sure. I do know that the Gutenmarked files display beautifully in both IE and firefox. I've run tidy, and I've split the html into chapters, and banged my head against the wall...

Anyone have any idea of what HTML2LRF is looking for in the html format? I'm thinking there is something missing from both the gutenmark files and the baen files.

-e-

FangornUK
01-11-2007, 03:01 PM
What sort of problem are you having with GutenMark & HTML2LRF? Works fine for me on Gutenberg text files with HTML2LRF and even handles those difficult formatted plays. Here's what I do to convert them:

Gutenmark.exe file.txt >new.html

(Using splitboot.pl in my gutlrf.pl (http://www.mobileread.com/forums/showthread.php?t=8532) scripts)

splitbook.pl -c h1 new.html

HTML2LRF can't handle HTML files above 600k in size so you need to use splitbook.pl to split the files.

P.S. Just a comment on this thread, can the BBeBook discussions go into a different thread? It seems to have taken over this HTML2LRF thread.

ReadWrite
01-15-2007, 11:12 PM
java -jar ../BBeBook-0.2.jar leaves.xml
[/code]
The generated leaves.lrf is different from the original included in the source tarball. The generated one gives an error in the Connect Reader the original does not. The files are attached. Essentially the difference between the files is that in the generated one the toc object id is 0x32 instead of 0x33 and a whole bunch of bytes that were different values in the original are set to 0x32 in the generated version.

If the original leaves.lrf was generated by BBeBook-0.2, I'm at a loss.[/QUOTE]

GREAT JOB with the decoding!
Minor bug fix to solve 0x32 problem.
The 0x32 is the id value that is not being updated/incremented.
In BBeBObject.java, line 22; make globalBBeBObject static:

private static short globalBBeBObject = 0x32;// Magic.. Ask Sony why all LRF files start here

Not sure this is the most efficient solution, since statics usually mean sloppy programming, but it is the easiest short-term solution.

Keep up the good work.

kovidgoyal
01-15-2007, 11:49 PM
Thats certainly and efficient short term fix! Thanks, now I can add BBeBook to libprs500. Hopefully scott1024 will fix it properly once he's done refactoring.

John Wheater
06-25-2007, 06:01 AM
I call this application BBeBook. It is far from perfect/complete in its understanding of LRF, but It is written 100% in Java and needs no dll's from Sony. It is Unicode clean.


Scotty1K - could you give a chap a clue how to use this package? I'm running Windows XP - is that OK?

kovidgoyal
06-25-2007, 12:37 PM
You should use more modern tools for html2lrf conversion. See for example this thread
http://www.mobileread.com/forums/showthread.php?t=10582

Vienna01
08-25-2007, 01:13 PM
What is the latest version of html2lrf?
Where is the link to d/l it? Only link I saw was version 0.1
I got the GUI to libprs500.
Is there a GUI for html2lrf?

kovidgoyal
08-25-2007, 01:59 PM
there is gui but its the beta gui of libprs500. THere's a thread for it in the deveoper's section.

Stingo
08-26-2007, 03:54 PM
there is gui but its the beta gui of libprs500. THere's a thread for it in the deveoper's section.

Is libprs500-beta available for osx? If so how can I get it to run?

By the way, after using BD for a while I tried html2lrf and I'm sold. You did a great job on the program.

kovidgoyal
08-31-2007, 02:13 PM
thanks. Just launch it from the commandline on osx. You have to run the GUI once first.

Stingo
09-01-2007, 12:36 PM
When I try that I get -bash: libprs500-beta: command not found

If I go to /Applications/"E-book Tools"/libprs500.app/Contents/MacOS/libprs500-beta
I can find libprs500 and run it. However, there is no libprs500-beta.

kovidgoyal
09-01-2007, 12:54 PM
Strange all the other commandline programs work? html2lrf etc?

Stingo
09-02-2007, 08:58 AM
Strange all the other commandline programs work? html2lrf etc?

I have two computes. On the laptop it all works fine from the command line. On the desktop none of the command line options work. I have tried deleting and reinstalling and it did no good. Both are using the same OSX.

Stingo
09-02-2007, 11:15 AM
I have two computes. On the laptop it all works fine from the command line. On the desktop none of the command line options work. I have tried deleting and reinstalling and it did no good. Both are using the same OSX.

Deleted and reinstalled again and now it took.