Nice work! Looks very promising.
You seem to be doing some Gutenberg specific detections and a simple clean-up for the HTML versions is page number stripping, I do that in gutlrf.pl like so:
$_ =~ s#<span class='pagenum'>.*</span>## ;
$_ =~ s#<span class=\"pagenum\">.*</span>## ;
I'll post more bug reports to the google code site.
|