Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 08-31-2008, 12:33 PM   #1
pepak
Guru
pepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura about
 
Posts: 610
Karma: 4150
Join Date: Mar 2008
Device: Sony Reader PRS-T3, Kobo Libra H2O
HTML2LRF "Preprocessor"

I have written a simple utility which is intended to complement Calibre's HTML2LRF in several ways which I found useful and which Kovidgoyal doesn't want (perhaps rightfully so) included in the utility itself. I focused on three things:

1) Include all HTML2LRF parameters within the HTML file itself. Until now, I had to write a separate batch file for every book I ever intended to convert, because each book uses similar but slightly different parameters.

2) Allow easy maintenance by centralizing common parameters while allowing for special behavior for each book.

3) Allow conversion of multiple books at once.

In the end, I decided to define several special <META> tags. My application would parse a HTML file for these tags, build a correct command line and finally execute HTML2LRF. This allows for easy inclusion of parameters while maintaining complete compatibility with HTML format.

A simple example: I have a book in HTML format. When I convert this book to LRF, I want to keep information about its title (HTML2LRF already does that) and author (HTML2LRF requires author specified on command line or in a special metadata file - .OPF, I think). And I might just as well define e.g. headers (as far as I know, this can only be specified on command line). I can do that by including these tags in the book's header:

<meta name="lrf:--author" content="First Last">
<meta name="lrf:--author-sort" content="Last First">
<meta name="lrf:--header" content="">
...

Since I wanted to keep this maintainable, I defined another special meta tag:

<meta name="@include" content="common.htm">

This would include the metatags from common.htm. Recursively, too. So that I could e.g. include an author-specific parameter file, which would in turn include a common parameter file.

If you think you'll find it useful, you can download the utility and its source code (in Delphi 5, but it should be easy enough to translate it to your favorite programming language) here:
http://www.pepak.net/sonyreader/h2lrf.zip

(The code is not optimized in any way - the purpose was to save myself effort writing batch files, not to demonstrate advanced algorithms; if it was to be too slow, no big deal - I would simply leave it running overnight)
pepak is offline   Reply With Quote
Old 08-31-2008, 12:50 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,858
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Cool this will be useful for people who don't want to use the GUI.
kovidgoyal is offline   Reply With Quote
Old 08-31-2008, 12:53 PM   #3
pepak
Guru
pepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura about
 
Posts: 610
Karma: 4150
Join Date: Mar 2008
Device: Sony Reader PRS-T3, Kobo Libra H2O
You can include it in your Calibre archive if you want. Or maybe translate it to Python and have your own version :-)
pepak is offline   Reply With Quote
Old 08-31-2008, 12:57 PM   #4
JeffElkins
Guru
JeffElkins is no ebook tyro.JeffElkins is no ebook tyro.JeffElkins is no ebook tyro.JeffElkins is no ebook tyro.JeffElkins is no ebook tyro.JeffElkins is no ebook tyro.JeffElkins is no ebook tyro.JeffElkins is no ebook tyro.JeffElkins is no ebook tyro.JeffElkins is no ebook tyro.
 
JeffElkins's Avatar
 
Posts: 780
Karma: 1416
Join Date: Jan 2008
Device: Kobo Clara 2E/HD, Kindle PW
Pepak,

Thanks for the utility! Take a look at Free Pascal. It's mostly Delphi 7 compatible and it's cross-platform. You can compile for Windows, Linux or Mac.
JeffElkins is offline   Reply With Quote
Old 09-01-2008, 03:38 AM   #5
pepak
Guru
pepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura about
 
Posts: 610
Karma: 4150
Join Date: Mar 2008
Device: Sony Reader PRS-T3, Kobo Libra H2O
Modified it to work with FreePascal. Same link as above.

Note: It will not run on non-Win32 platforms, unless FreePascal is far more advanced than I would expect: I am pretty certain the CreateProcess call and the following code will be a bit problematic on other platforms. It should be possible to comment out the whole section, though, and only keep the Writeln('EXEC: ...') - that way the tool will generate a script rather than run the converter directly.
pepak is offline   Reply With Quote
Old 09-01-2008, 08:09 AM   #6
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 73,974
Karma: 128903378
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
I suggest looking at Python and rewriting it for Python. It would mean not having to install FreePascal. I have (and I suspect a lot of others) have Python installed.
JSWolf is online now   Reply With Quote
Old 09-01-2008, 08:31 AM   #7
Madam Broshkina
Manic Do Fuse
Madam Broshkina ought to be getting tired of karma fortunes by now.Madam Broshkina ought to be getting tired of karma fortunes by now.Madam Broshkina ought to be getting tired of karma fortunes by now.Madam Broshkina ought to be getting tired of karma fortunes by now.Madam Broshkina ought to be getting tired of karma fortunes by now.Madam Broshkina ought to be getting tired of karma fortunes by now.Madam Broshkina ought to be getting tired of karma fortunes by now.Madam Broshkina ought to be getting tired of karma fortunes by now.Madam Broshkina ought to be getting tired of karma fortunes by now.Madam Broshkina ought to be getting tired of karma fortunes by now.Madam Broshkina ought to be getting tired of karma fortunes by now.
 
Madam Broshkina's Avatar
 
Posts: 2,312
Karma: 3325462
Join Date: Oct 2006
Device: Sony 500, 505, 350, Kindle 3, DXG, nook, Irex DR800SG, iPad
Quote:
Originally Posted by JSWolf View Post
I suggest looking at Python and rewriting it for Python. A lot of peol;le already have Python installed and would really rather not have to install FreePascal just for this one program. But if it was in Python, we'd happy give it a go.
Since you do not speak for everyone on this board I have fixed your post.

I suggest looking at Pyton and rewriting it for Python. I already have Python installed and would really rather not install FreePascal for this one program. But if it were in Python, I'd be happy to give it a go.
Madam Broshkina is offline   Reply With Quote
Old 09-01-2008, 08:49 AM   #8
pepak
Guru
pepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura about
 
Posts: 610
Karma: 4150
Join Date: Mar 2008
Device: Sony Reader PRS-T3, Kobo Libra H2O
I don't have Python installed and don't know Python. If there was to be a Python rewrite, I would much prefer to have it as a part of HTML2LRF rather than a standalone program.

Anyway, since this will only run under Windows anyway, you can just as well use the included EXE. You don't need the sources to run it. The sources are intended for:

- Developers of HTML2LRF in case they wanted to implement something like this in their code.
- Careful users who prefer to check the code and compile it themselves to avoid any nasty surprises.
pepak is offline   Reply With Quote
Old 09-01-2008, 08:57 AM   #9
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 73,974
Karma: 128903378
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by pepak View Post
I don't have Python installed and don't know Python. If there was to be a Python rewrite, I would much prefer to have it as a part of HTML2LRF rather than a standalone program.

Anyway, since this will only run under Windows anyway, you can just as well use the included EXE. You don't need the sources to run it. The sources are intended for:

- Developers of HTML2LRF in case they wanted to implement something like this in their code.
- Careful users who prefer to check the code and compile it themselves to avoid any nasty surprises.
Thank you. Yes, the exe will do quite nicely. Especially for eBooks that I keep converting as I'm setting them up.
JSWolf is online now   Reply With Quote
Old 09-01-2008, 09:11 AM   #10
JeffElkins
Guru
JeffElkins is no ebook tyro.JeffElkins is no ebook tyro.JeffElkins is no ebook tyro.JeffElkins is no ebook tyro.JeffElkins is no ebook tyro.JeffElkins is no ebook tyro.JeffElkins is no ebook tyro.JeffElkins is no ebook tyro.JeffElkins is no ebook tyro.JeffElkins is no ebook tyro.
 
JeffElkins's Avatar
 
Posts: 780
Karma: 1416
Join Date: Jan 2008
Device: Kobo Clara 2E/HD, Kindle PW
Quote:
Originally Posted by JSWolf View Post
I suggest looking at Python and rewriting it for Python. It would mean not having to install FreePascal. I have (and I suspect a lot of others) have Python installed.

FreePascal is a compiler. Generally, code produced by a compiler does not require that you install it's environment...it produces a standalone executable. With some systems (Visual Basic) a runtime library may be required. Python OTOH is an interpreted language.

You're comparing apples and oranges.
JeffElkins is offline   Reply With Quote
Old 09-01-2008, 09:25 AM   #11
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 73,974
Karma: 128903378
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
Quote:
Originally Posted by JeffElkins View Post
FreePascal is a compiler. Generally, code produced by a compiler does not require that you install it's environment...it produces a standalone executable. With some systems (Visual Basic) a runtime library may be required. Python OTOH is an interpreted language.

You're comparing apples and oranges.
I wasn't thinking of FreePascal as a compiler to be honest. I was thinking of it as an an interpreter. I know you can make exe from most Interpreted languages but again, not where my mind was at the time.
JSWolf is online now   Reply With Quote
Old 09-01-2008, 09:47 AM   #12
pepak
Guru
pepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura about
 
Posts: 610
Karma: 4150
Join Date: Mar 2008
Device: Sony Reader PRS-T3, Kobo Libra H2O
So that's why you wanted me to rewrite it into Python! :-)

Anyway, I am glad there is apparently at least one person who finds my little utility useful.
pepak is offline   Reply With Quote
Old 12-03-2008, 05:42 AM   #13
pepak
Guru
pepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura aboutpepak has a spectacular aura about
 
Posts: 610
Karma: 4150
Join Date: Mar 2008
Device: Sony Reader PRS-T3, Kobo Libra H2O
I updated the utility with more functions:

- Better error handling. Now it can simply die with the first error or log the error and continue with the next file.

- Full preprocessing mode. Up until now the included files were only scanned for the metadata-generating tags. Since this version you can include just about any HTML code. This can be used to overcome certain shortcomings of HTML2LRF, e.g. its inability to support @import directive in CSS.

- Output files can be stored in a directory other than the current one, and they can mimic the directory structure of the source files, too.

Same link as above.

I suppose the next thing on agenda should be STDIN support, to allow for any preprocessor (e.g. PHP).
pepak is offline   Reply With Quote
Old 01-22-2009, 05:23 PM   #14
sehrgut
Junior Member
sehrgut began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Jan 2009
Device: prs500, prs350, wexler flexone
h2lwrap

I'm in the process of cleaning up a bunch of text ebooks, and html is the easiest and most versatile structured format to generate and store the "masters". I started looking for a util to pass meta-tags on to html2lrf, and found this. Unfortunately, as my work is all on OS X and Linux, I can't use pepak's util.

Below is something I tossed together that handles any corner cases I was able to come up with rather well. It has the added benefit of running anywhere you have bash and sed. (Namely, anywhere you _can't_ use the win32 util. *grin*)

Thanks for the idea, pepak!

Code:
#!/bin/sh

# h2lwrap harvests meta tags from file $1 in the namespace $prefix
# and runs html2lrf on $1 with those tags as options.
#
# For compatibility with pepak's h2lrf utility, it permits meta
# names in the form "$prefix:--option" as well.
#
# EXAMPLE
#
# <meta name="lrf:author" content="Anthony Burgess" />
# <meta name="lrf:author-sort" content="Burgess, Anthony" />
# <meta name="lrf:title" content="A Clockwork Orange" />
# <meta name="lrf:title-sort" content="Clockwork Orange, A" />
# <meta name="lrf:headerformat" content="%t, %a" />
# <meta name="lrf:page-break-before-tag" content="$" />
# <meta name="lrf:disable-chapter-detection" />
# <meta name="lrf:header" />
#
# todo: "@include" support
#

prefix=lrf
verstring='h2lwrap v0.1a keith beckman 012209'

function getmeta () {
	sed -Ef /dev/fd/7 7<<EOF
s,<meta +name="${prefix}:(--)?([^"]+)" +content="(.+)"( +)?/?>,--\2="\3",
t end
s,<meta +name="${prefix}:(--)?([^"]+)"( +)?/?>,--\2,
t end
d
:end
EOF
}

function echo_usage () {
		cat >&2 <<EOF
Usage: h2lwrap -[nhv] [file]
	h2lwrap finds ebook meta-information within html files
	and passes it along to html2lrf(1) when converting. See
	script comments for meta tag formatting.

	If file does not exist, stdin is read to file before converting.

	-n|--noconvert shows the arguments that would've been used
	-v|--version displays the version string
	-h|--help displays this help text
EOF
	}

while [ -n "$1" ]; do
	case "$1" in

		'-h'|'--help'|'-u'|'--usage')
			echo_usage
			exit 0
			;;
		'-v'|'--version')
			echo "$verstring" >&2
			exit 0
			;;
		'-n'|'--noconvert')
			noconvert=1
			;;
		*)
			input="$1"
	esac
	shift
done

if [ ! -f "$input" ]; then
	cat > "$input"
fi

args=`getmeta < "$input" | sed 's/ /\ /g' | paste -sd\  /dev/stdin`

if [ $noconvert ]; then
	echo "$args"
else
	html2lrf "$args" "$input"
fi
sehrgut is offline   Reply With Quote
Old 01-22-2009, 11:13 PM   #15
S.Canton
Groupie
S.Canton began at the beginning.
 
S.Canton's Avatar
 
Posts: 174
Karma: 12
Join Date: Jan 2009
Device: Kindle 2
Really awesome little utility, thanks a lot.
S.Canton is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
"Settings," then "311" - Int'l Kindle searches for wireless providers in the area Dr. Drib Amazon Kindle 2 08-28-2011 10:27 AM
Yep. It's official. Sony Reader has "ruined" books for me. A final "review." WilliamG Sony Reader 48 01-14-2011 03:49 AM
"Zeit-Odyssee"-Trilogie droht das "dunkle Turm"-Schicksal ThR E-Books 4 02-10-2010 05:18 AM
Question - Does iLiab have the "search" & "annotation, highlighting" features? HiSoC8Y iRex 5 07-01-2009 04:37 PM
How to encpode "top of page blurb" for html2lrf? sahlberg Calibre 3 10-20-2008 03:58 AM


All times are GMT -4. The time now is 06:36 AM.


MobileRead.com is a privately owned, operated and funded community.