Quote:
Originally Posted by ashkulz
This can easily be done as a plugin (hint to nrapallo)
|
Quote:
Originally Posted by nrapallo
Oh, nice, now I get volunteered by the Master... How can I say no...
|
I have had a lot of experience "cleaning up" html code to properly prepare a .imp ebook, but unfortunately, it is mostly from writing/using Mobi2IMP in perl.
To aid in the
python plug-in cause, I offer simple text substitutes that could help ensure a html webpage/file will convert well to .imp (and hopefully better handled by our reader's internal browser).
Below please find a perl script fragment from Mobi2IMP.pl v9.4b
where $html is the html (text) webpage
and $opt_NAME is an option you may wish to pass through
and $booktitle and $author could hold favortie hyperlinks in header or footers, if this works!
Just to get you started on those useful plug-ins...
Code:
###################################################################
my $headerhr = "\n<HEADER><table border=\"0\" cellpadding=\"0\" cellspacing=\"0\" width=\"100%\"><tr>\n";
$headerhr .= "<td align=\"left\" style=\"font-family:smallfont\"><small>" . $booktitle . "</small></td>\n";
$headerhr .= "<td align=\"right\" style=\"font-family:smallfont\"><small>" . $author . "</small></td></tr></table><hr></HEADER>\n";
my $headercolor = "\n<HEADER><table border=\"0\" cellpadding=\"0\" cellspacing=\"0\" width=\"100%\"><tr>\n";
$headercolor .= "<td align=\"left\" style=\"font-family:smallfont\" bgcolor=\"" . $opt_header_color . "\"><small>" . $booktitle . "</small></td>\n";
$headercolor .= "<td align=\"right\" style=\"font-family:smallfont\" bgcolor=\"" . $opt_header_color . "\"><small>" . $author . "</small></td></tr></table></HEADER>\n";
if (defined $opt_bgcolor) {
$html =~ s/<body([^>])*>/\n<BODY bgcolor=$opt_bgcolor>\n/i; #remove .mobi defaults in <body> and insert bgcolor
} else {
$html =~ s/<body([^>])*>/\n<BODY>\n/i; #remove .mobi defaults in <body>
}
if (defined $opt_header_hr) {
$html =~ s/<body([^>])*>/<BODY$1>\n$headerhr/i; #remove .mobi defaults in <body> and insert header-hr
} elsif (defined $opt_header_color) {
$html =~ s/<body([^>])*>/<BODY$1>\n$headercolor/i; #remove .mobi defaults in <body> and insert header-color
}
if (defined $opt_nopara and not defined $opt_noBRfix) {
$html =~ s/<br([^>])*><div/<BR \/><BR \/><div/gi; #force <br /> to work in Ebook Publisher
}
if (defined $opt_indent) {
#indent (~2 characters)
if (defined $opt_nopara) {
$html =~ s/<\/head>/<STYLE type="text\/css">p {text-indent:1em; margin-top:0em; margin-bottom:0em} header {display:none; display:oeb-page-head}<\/STYLE><\/head>/i; #nopara separation (--nopara)
} else {
$html =~ s/<\/head>/<STYLE type="text\/css">p {text-indent:1em; padding-top:0em; padding-bottom:1em} header {display:none; display:oeb-page-head}<\/STYLE><\/head>/i; #para separation (default)
}
} else {
#noindent (default)
if (defined $opt_nopara) {
$html =~ s/<\/head>/<STYLE type="text\/css">p {text-indent:0em; margin-top:0em; margin-bottom:0em} header {display:none; display:oeb-page-head}<\/STYLE><\/head>/i; #nopara separation (--nopara)
} else {
$html =~ s/<\/head>/<STYLE type="text\/css">p {text-indent:0em; padding-top:0em; padding-bottom:1em} header {display:none; display:oeb-page-head}<\/STYLE><\/head>/i; #para separation (default)
}
}
my $LRmargins = "2%";
if (defined $opt_LRmargins) { $LRmargins = $opt_LRmargins; }
if (defined $opt_nomargins) { $LRmargins = "0%"; }
if (defined $opt_nojustify) {
#nojustify body text (left-align)
if (defined $opt_smallerfont) {
$html =~ s/<body/<BODY style="margin-left:$LRmargins; margin-right:$LRmargins; font-size:x-small; text-align:left"/i; # add small margins and left-align text
} elsif (defined $opt_largerfont) {
$html =~ s/<body/<BODY style="margin-left:$LRmargins; margin-right:$LRmargins; font-size:medium; text-align:left"/i; # add small margins and left-align text
} else {
$html =~ s/<body/<BODY style="margin-left:$LRmargins; margin-right:$LRmargins; text-align:left"/i; # add small margins and left-align text
}
} else {
#justify body text (default)
if (defined $opt_smallerfont) {
$html =~ s/<body/<BODY style="margin-left:$LRmargins; margin-right:$LRmargins; font-size:x-small; text-align:justify"/i; # add small margins and justified text
} elsif (defined $opt_largerfont) {
$html =~ s/<body/<BODY style="margin-left:$LRmargins; margin-right:$LRmargins; font-size:large; text-align:justify"/i; # add small margins and justified text
} else {
$html =~ s/<body/<BODY style="margin-left:$LRmargins; margin-right:$LRmargins; text-align:justify"/i; # add small margins and justified text
}
}
$html =~ s///gi; # remove odd insertion of null chars
$html =~ s/<mbp:pagebreak/<p style="page-break-before:always"/gi; # insert proper page-breaks
$html =~ s/<mbpagebreak/<p style="page-break-before:always"/gi; # insert proper page-breaks
$html =~ s/<img align="baseline"/<img/gi; # remove the troublesome baseline keyword
$html =~ s/(<BR \/><BR \/>)+<div align="center"><img/<BR \/><div align="center"><p align="center"><img/gi; # only allow one <br> before an image to avoid after a page-break
$html =~ s/<div align="center"><img/<div align="center"><p align="center"><img/gi; # kludge to get eBook Publisher to center images
#fix up blank lines (unwanted) before page-break
$html =~ s/((<div([^>])*>( )*<\/div>)*(\s)*(<br([^>])*>)*(\s)*)*<p style="page-break-before/\n<p style="page-break-before/gi;
$html =~ s/((<br([^>])*>)*(\s)*)*<p style="page-break-before/\n<p style="page-break-before/gi;
$html =~ s/(<p style="page-break-before:always">)*<\/body>/<\/body>/gi; #fix up last (unwanted) page-break
$html =~ s/((<br([^>])*>)*(\s)*)*<\/body>/\n<\/body>/gi; #fix up blank lines (unwanted) at end
$html =~ s/<p/\n<p/gi; # insert newline before '<p' construct
###################################################################