MobileRead Forums - View Single Post

SBT · 12-13-2011, 01:02 PM

Time to convert our tagged file to xhtml:

Code:

function zx_txt2xhtml {
# Usage zx_txt2html [textfile]
# Converts a text-file with %-type tags to an xhtml file.
# The file should be run through html tidy afterwards.
# If no input file is given; input is read from STDIN.
# Output is to STDOUT
#-e '/^%q/,/^%[^q]/{s/^%q[ \t]\+/<div class="intro">/;s/%[^q]/<\/div>\n&/}' |\
[ $1 ] && inputfile=$1 || inputfile="/dev/stdin"
cat $inputfile  |\
sed -e s/"^%c \(.*\)"/"<\/p>\n<hr class=\"endchapter\"\/>\n\n<h2 class=\"chapter\">\1<\/h2>"/ |\
sed -e s/"^%y[  ]\+\([^A-Z0-9]*[A-Z0-9]\)\([^ ]*\)"/"<p class=\"initial\"><span class=\"drop\">\1<\/span><span class=\"first\">\2<\/span>"/ \
-e s/"^\( \{6,8\}\|\t\)"/"<\/p>\n<p>"/ \
-e s/"#-"/"-"/g \
-e /"^%[pPiw].*"/s/".*"/"<!-- & -->"/ |\
sed /"^$"/d |\
sed -e s/"<span class=\"drop\">\(.*\)\([AL]\)<\/span><span class=\"first\">"/"<span class=\"drop\">\1\2<\/span><span class=\"after\2\">"/ \
-e 1i'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">\
<html xmlns="http://www.w3.org/1999/xhtml">\
<head> \
<meta http-equiv="Content-Type" content="text/html; charset=utf8" /> \
<title></title> \n\
<link href="main.css" rel="stylesheet" type="text/css" /> </head> \
<body>' \
-e \$a"</body>\n</html>"
}

I suggest the following tidy command:

Code:

tidy  -asxhtml -utf8

Personally I like to also use the -e option, and correct errors by hand. I don't trust tidy to not be overly enthusiastic in its tidiness.
All the %-tags which are not converted to html tags are enclosed in comments. No need to remove information unless you have to.

At this point we should have a nice, well-formatted xhtml file, all ready to be fed into Sigil or Calibre. Or about a dozen other epub creation tools.
Or we can bloody-mindedly finish as we started, and just make a few more bash functions to arrive at a complete epub file...