View Single Post
Old 12-13-2011, 01:02 PM   #12
SBT
Fanatic
SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.SBT ought to be getting tired of karma fortunes by now.
 
SBT's Avatar
 
Posts: 580
Karma: 810184
Join Date: Sep 2010
Location: Norway
Device: prs-t1, tablet, Nook Simple, assorted kindles, iPad
convert to xhtml

Time to convert our tagged file to xhtml:
Code:
function zx_txt2xhtml {
# Usage zx_txt2html [textfile]
# Converts a text-file with %-type tags to an xhtml file.
# The file should be run through html tidy afterwards.
# If no input file is given; input is read from STDIN.
# Output is to STDOUT
#-e '/^%q/,/^%[^q]/{s/^%q[ \t]\+/<div class="intro">/;s/%[^q]/<\/div>\n&/}' |\
[ $1 ] && inputfile=$1 || inputfile="/dev/stdin"
cat $inputfile  |\
sed -e s/"^%c \(.*\)"/"<\/p>\n<hr class=\"endchapter\"\/>\n\n<h2 class=\"chapter\">\1<\/h2>"/ |\
sed -e s/"^%y[  ]\+\([^A-Z0-9]*[A-Z0-9]\)\([^ ]*\)"/"<p class=\"initial\"><span class=\"drop\">\1<\/span><span class=\"first\">\2<\/span>"/ \
-e s/"^\( \{6,8\}\|\t\)"/"<\/p>\n<p>"/ \
-e s/"#-"/"-"/g \
-e /"^%[pPiw].*"/s/".*"/"<!-- & -->"/ |\
sed /"^$"/d |\
sed -e s/"<span class=\"drop\">\(.*\)\([AL]\)<\/span><span class=\"first\">"/"<span class=\"drop\">\1\2<\/span><span class=\"after\2\">"/ \
-e 1i'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">\
<html xmlns="http://www.w3.org/1999/xhtml">\
<head> \
<meta http-equiv="Content-Type" content="text/html; charset=utf8" /> \
<title></title> \n\
<link href="main.css" rel="stylesheet" type="text/css" /> </head> \
<body>' \
-e \$a"</body>\n</html>"
}
I suggest the following tidy command:
Code:
tidy  -asxhtml -utf8
Personally I like to also use the -e option, and correct errors by hand. I don't trust tidy to not be overly enthusiastic in its tidiness.
All the %-tags which are not converted to html tags are enclosed in comments. No need to remove information unless you have to.

At this point we should have a nice, well-formatted xhtml file, all ready to be fed into Sigil or Calibre. Or about a dozen other epub creation tools.
Or we can bloody-mindedly finish as we started, and just make a few more bash functions to arrive at a complete epub file...
SBT is offline   Reply With Quote