08-31-2008, 12:33 PM | #1 |
Guru
Posts: 610
Karma: 4150
Join Date: Mar 2008
Device: Sony Reader PRS-T3, Kobo Libra H2O
|
HTML2LRF "Preprocessor"
I have written a simple utility which is intended to complement Calibre's HTML2LRF in several ways which I found useful and which Kovidgoyal doesn't want (perhaps rightfully so) included in the utility itself. I focused on three things:
1) Include all HTML2LRF parameters within the HTML file itself. Until now, I had to write a separate batch file for every book I ever intended to convert, because each book uses similar but slightly different parameters. 2) Allow easy maintenance by centralizing common parameters while allowing for special behavior for each book. 3) Allow conversion of multiple books at once. In the end, I decided to define several special <META> tags. My application would parse a HTML file for these tags, build a correct command line and finally execute HTML2LRF. This allows for easy inclusion of parameters while maintaining complete compatibility with HTML format. A simple example: I have a book in HTML format. When I convert this book to LRF, I want to keep information about its title (HTML2LRF already does that) and author (HTML2LRF requires author specified on command line or in a special metadata file - .OPF, I think). And I might just as well define e.g. headers (as far as I know, this can only be specified on command line). I can do that by including these tags in the book's header: <meta name="lrf:--author" content="First Last"> <meta name="lrf:--author-sort" content="Last First"> <meta name="lrf:--header" content=""> ... Since I wanted to keep this maintainable, I defined another special meta tag: <meta name="@include" content="common.htm"> This would include the metatags from common.htm. Recursively, too. So that I could e.g. include an author-specific parameter file, which would in turn include a common parameter file. If you think you'll find it useful, you can download the utility and its source code (in Delphi 5, but it should be easy enough to translate it to your favorite programming language) here: http://www.pepak.net/sonyreader/h2lrf.zip (The code is not optimized in any way - the purpose was to save myself effort writing batch files, not to demonstrate advanced algorithms; if it was to be too slow, no big deal - I would simply leave it running overnight) |
08-31-2008, 12:50 PM | #2 |
creator of calibre
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Cool this will be useful for people who don't want to use the GUI.
|
08-31-2008, 12:53 PM | #3 |
Guru
Posts: 610
Karma: 4150
Join Date: Mar 2008
Device: Sony Reader PRS-T3, Kobo Libra H2O
|
You can include it in your Calibre archive if you want. Or maybe translate it to Python and have your own version :-)
|
08-31-2008, 12:57 PM | #4 |
Guru
Posts: 780
Karma: 1416
Join Date: Jan 2008
Device: Kobo Clara 2E/HD, Kindle PW
|
Pepak,
Thanks for the utility! Take a look at Free Pascal. It's mostly Delphi 7 compatible and it's cross-platform. You can compile for Windows, Linux or Mac. |
09-01-2008, 03:38 AM | #5 |
Guru
Posts: 610
Karma: 4150
Join Date: Mar 2008
Device: Sony Reader PRS-T3, Kobo Libra H2O
|
Modified it to work with FreePascal. Same link as above.
Note: It will not run on non-Win32 platforms, unless FreePascal is far more advanced than I would expect: I am pretty certain the CreateProcess call and the following code will be a bit problematic on other platforms. It should be possible to comment out the whole section, though, and only keep the Writeln('EXEC: ...') - that way the tool will generate a script rather than run the converter directly. |
09-01-2008, 08:09 AM | #6 |
Resident Curmudgeon
Posts: 73,974
Karma: 128903378
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
I suggest looking at Python and rewriting it for Python. It would mean not having to install FreePascal. I have (and I suspect a lot of others) have Python installed.
|
09-01-2008, 08:31 AM | #7 | |
Manic Do Fuse
Posts: 2,312
Karma: 3325462
Join Date: Oct 2006
Device: Sony 500, 505, 350, Kindle 3, DXG, nook, Irex DR800SG, iPad
|
Quote:
I suggest looking at Pyton and rewriting it for Python. I already have Python installed and would really rather not install FreePascal for this one program. But if it were in Python, I'd be happy to give it a go. |
|
09-01-2008, 08:49 AM | #8 |
Guru
Posts: 610
Karma: 4150
Join Date: Mar 2008
Device: Sony Reader PRS-T3, Kobo Libra H2O
|
I don't have Python installed and don't know Python. If there was to be a Python rewrite, I would much prefer to have it as a part of HTML2LRF rather than a standalone program.
Anyway, since this will only run under Windows anyway, you can just as well use the included EXE. You don't need the sources to run it. The sources are intended for: - Developers of HTML2LRF in case they wanted to implement something like this in their code. - Careful users who prefer to check the code and compile it themselves to avoid any nasty surprises. |
09-01-2008, 08:57 AM | #9 | |
Resident Curmudgeon
Posts: 73,974
Karma: 128903378
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Quote:
|
|
09-01-2008, 09:11 AM | #10 | |
Guru
Posts: 780
Karma: 1416
Join Date: Jan 2008
Device: Kobo Clara 2E/HD, Kindle PW
|
Quote:
FreePascal is a compiler. Generally, code produced by a compiler does not require that you install it's environment...it produces a standalone executable. With some systems (Visual Basic) a runtime library may be required. Python OTOH is an interpreted language. You're comparing apples and oranges. |
|
09-01-2008, 09:25 AM | #11 | |
Resident Curmudgeon
Posts: 73,974
Karma: 128903378
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
Quote:
|
|
09-01-2008, 09:47 AM | #12 |
Guru
Posts: 610
Karma: 4150
Join Date: Mar 2008
Device: Sony Reader PRS-T3, Kobo Libra H2O
|
So that's why you wanted me to rewrite it into Python! :-)
Anyway, I am glad there is apparently at least one person who finds my little utility useful. |
12-03-2008, 05:42 AM | #13 |
Guru
Posts: 610
Karma: 4150
Join Date: Mar 2008
Device: Sony Reader PRS-T3, Kobo Libra H2O
|
I updated the utility with more functions:
- Better error handling. Now it can simply die with the first error or log the error and continue with the next file. - Full preprocessing mode. Up until now the included files were only scanned for the metadata-generating tags. Since this version you can include just about any HTML code. This can be used to overcome certain shortcomings of HTML2LRF, e.g. its inability to support @import directive in CSS. - Output files can be stored in a directory other than the current one, and they can mimic the directory structure of the source files, too. Same link as above. I suppose the next thing on agenda should be STDIN support, to allow for any preprocessor (e.g. PHP). |
01-22-2009, 05:23 PM | #14 |
Junior Member
Posts: 5
Karma: 10
Join Date: Jan 2009
Device: prs500, prs350, wexler flexone
|
h2lwrap
I'm in the process of cleaning up a bunch of text ebooks, and html is the easiest and most versatile structured format to generate and store the "masters". I started looking for a util to pass meta-tags on to html2lrf, and found this. Unfortunately, as my work is all on OS X and Linux, I can't use pepak's util.
Below is something I tossed together that handles any corner cases I was able to come up with rather well. It has the added benefit of running anywhere you have bash and sed. (Namely, anywhere you _can't_ use the win32 util. *grin*) Thanks for the idea, pepak! Code:
#!/bin/sh # h2lwrap harvests meta tags from file $1 in the namespace $prefix # and runs html2lrf on $1 with those tags as options. # # For compatibility with pepak's h2lrf utility, it permits meta # names in the form "$prefix:--option" as well. # # EXAMPLE # # <meta name="lrf:author" content="Anthony Burgess" /> # <meta name="lrf:author-sort" content="Burgess, Anthony" /> # <meta name="lrf:title" content="A Clockwork Orange" /> # <meta name="lrf:title-sort" content="Clockwork Orange, A" /> # <meta name="lrf:headerformat" content="%t, %a" /> # <meta name="lrf:page-break-before-tag" content="$" /> # <meta name="lrf:disable-chapter-detection" /> # <meta name="lrf:header" /> # # todo: "@include" support # prefix=lrf verstring='h2lwrap v0.1a keith beckman 012209' function getmeta () { sed -Ef /dev/fd/7 7<<EOF s,<meta +name="${prefix}:(--)?([^"]+)" +content="(.+)"( +)?/?>,--\2="\3", t end s,<meta +name="${prefix}:(--)?([^"]+)"( +)?/?>,--\2, t end d :end EOF } function echo_usage () { cat >&2 <<EOF Usage: h2lwrap -[nhv] [file] h2lwrap finds ebook meta-information within html files and passes it along to html2lrf(1) when converting. See script comments for meta tag formatting. If file does not exist, stdin is read to file before converting. -n|--noconvert shows the arguments that would've been used -v|--version displays the version string -h|--help displays this help text EOF } while [ -n "$1" ]; do case "$1" in '-h'|'--help'|'-u'|'--usage') echo_usage exit 0 ;; '-v'|'--version') echo "$verstring" >&2 exit 0 ;; '-n'|'--noconvert') noconvert=1 ;; *) input="$1" esac shift done if [ ! -f "$input" ]; then cat > "$input" fi args=`getmeta < "$input" | sed 's/ /\ /g' | paste -sd\ /dev/stdin` if [ $noconvert ]; then echo "$args" else html2lrf "$args" "$input" fi |
01-22-2009, 11:13 PM | #15 |
Groupie
Posts: 174
Karma: 12
Join Date: Jan 2009
Device: Kindle 2
|
Really awesome little utility, thanks a lot.
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
"Settings," then "311" - Int'l Kindle searches for wireless providers in the area | Dr. Drib | Amazon Kindle | 2 | 08-28-2011 10:27 AM |
Yep. It's official. Sony Reader has "ruined" books for me. A final "review." | WilliamG | Sony Reader | 48 | 01-14-2011 03:49 AM |
"Zeit-Odyssee"-Trilogie droht das "dunkle Turm"-Schicksal | ThR | E-Books | 4 | 02-10-2010 05:18 AM |
Question - Does iLiab have the "search" & "annotation, highlighting" features? | HiSoC8Y | iRex | 5 | 07-01-2009 04:37 PM |
How to encpode "top of page blurb" for html2lrf? | sahlberg | Calibre | 3 | 10-20-2008 03:58 AM |