View Single Post
Old 09-02-2008, 11:09 AM   #47
nrapallo
GuteBook/Mobi2IMP Creator
nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.nrapallo ought to be getting tired of karma fortunes by now.
 
nrapallo's Avatar
 
Posts: 2,958
Karma: 2530691
Join Date: Dec 2007
Location: Toronto, Canada
Device: REB1200 EBW1150 Device: T1 NSTG iLiad_v2 NC Device: Asus_TF Next1 WPDN
Quote:
Originally Posted by ashkulz
This can easily be done as a plugin (hint to nrapallo)
Quote:
Originally Posted by nrapallo View Post
Oh, nice, now I get volunteered by the Master... How can I say no...
I have had a lot of experience "cleaning up" html code to properly prepare a .imp ebook, but unfortunately, it is mostly from writing/using Mobi2IMP in perl.

To aid in the python plug-in cause, I offer simple text substitutes that could help ensure a html webpage/file will convert well to .imp (and hopefully better handled by our reader's internal browser).

Below please find a perl script fragment from Mobi2IMP.pl v9.4b
where $html is the html (text) webpage
and $opt_NAME is an option you may wish to pass through
and $booktitle and $author could hold favortie hyperlinks in header or footers, if this works!
Just to get you started on those useful plug-ins...

Code:
###################################################################

my $headerhr = "\n<HEADER><table border=\"0\" cellpadding=\"0\" cellspacing=\"0\" width=\"100%\"><tr>\n";
$headerhr .= "<td align=\"left\" style=\"font-family:smallfont\"><small>" . $booktitle . "</small></td>\n";
$headerhr .= "<td align=\"right\" style=\"font-family:smallfont\"><small>" . $author . "</small></td></tr></table><hr></HEADER>\n";

my $headercolor = "\n<HEADER><table border=\"0\" cellpadding=\"0\" cellspacing=\"0\" width=\"100%\"><tr>\n";
$headercolor .= "<td align=\"left\" style=\"font-family:smallfont\" bgcolor=\"" . $opt_header_color . "\"><small>" . $booktitle . "</small></td>\n"; 
$headercolor .= "<td align=\"right\" style=\"font-family:smallfont\" bgcolor=\"" . $opt_header_color . "\"><small>" . $author . "</small></td></tr></table></HEADER>\n";

if (defined $opt_bgcolor) {
    $html =~ s/<body([^>])*>/\n<BODY bgcolor=$opt_bgcolor>\n/i;  #remove .mobi defaults in <body> and insert bgcolor
} else {
    $html =~ s/<body([^>])*>/\n<BODY>\n/i;                       #remove .mobi defaults in <body>
}

if (defined $opt_header_hr) {
        $html =~ s/<body([^>])*>/<BODY$1>\n$headerhr/i;          #remove .mobi defaults in <body> and insert header-hr
} elsif (defined $opt_header_color) {
        $html =~ s/<body([^>])*>/<BODY$1>\n$headercolor/i;       #remove .mobi defaults in <body> and insert header-color
}

if (defined $opt_nopara and not defined $opt_noBRfix) {
    $html =~ s/<br([^>])*><div/<BR \/><BR \/><div/gi;             #force <br /> to work in Ebook Publisher
}

if (defined $opt_indent) {
    #indent (~2 characters)
    if (defined $opt_nopara) {
        $html =~ s/<\/head>/<STYLE type="text\/css">p {text-indent:1em; margin-top:0em; margin-bottom:0em} header {display:none; display:oeb-page-head}<\/STYLE><\/head>/i;   #nopara separation (--nopara)
    } else {
        $html =~ s/<\/head>/<STYLE type="text\/css">p {text-indent:1em; padding-top:0em; padding-bottom:1em} header {display:none; display:oeb-page-head}<\/STYLE><\/head>/i;   #para separation (default)
    }
} else {
    #noindent (default)
    if (defined $opt_nopara) {
        $html =~ s/<\/head>/<STYLE type="text\/css">p {text-indent:0em; margin-top:0em; margin-bottom:0em} header {display:none; display:oeb-page-head}<\/STYLE><\/head>/i;   #nopara separation (--nopara)
    } else {
        $html =~ s/<\/head>/<STYLE type="text\/css">p {text-indent:0em; padding-top:0em; padding-bottom:1em} header {display:none; display:oeb-page-head}<\/STYLE><\/head>/i;   #para separation (default)
    }
}

my $LRmargins = "2%";
if (defined $opt_LRmargins) { $LRmargins = $opt_LRmargins; }
if (defined $opt_nomargins) { $LRmargins = "0%"; }
if (defined $opt_nojustify) {
    #nojustify body text (left-align)
    if (defined $opt_smallerfont) {
        $html =~ s/<body/<BODY style="margin-left:$LRmargins; margin-right:$LRmargins; font-size:x-small; text-align:left"/i;    # add small margins and left-align text
    } elsif (defined $opt_largerfont) {
        $html =~ s/<body/<BODY style="margin-left:$LRmargins; margin-right:$LRmargins; font-size:medium; text-align:left"/i;     # add small margins and left-align text
    } else {
        $html =~ s/<body/<BODY style="margin-left:$LRmargins; margin-right:$LRmargins; text-align:left"/i;                       # add small margins and left-align text
    }
} else {
    #justify body text (default)
    if (defined $opt_smallerfont) {
        $html =~ s/<body/<BODY style="margin-left:$LRmargins; margin-right:$LRmargins; font-size:x-small; text-align:justify"/i; # add small margins and justified text
    } elsif (defined $opt_largerfont) {
        $html =~ s/<body/<BODY style="margin-left:$LRmargins; margin-right:$LRmargins; font-size:large; text-align:justify"/i;   # add small margins and justified text
    } else {
        $html =~ s/<body/<BODY style="margin-left:$LRmargins; margin-right:$LRmargins; text-align:justify"/i;                    # add small margins and justified text
    }
}

    $html =~ s/�//gi;                                              # remove odd insertion of null chars
    $html =~ s/<mbp:pagebreak/<p style="page-break-before:always"/gi; # insert proper page-breaks
    $html =~ s/<mbpagebreak/<p style="page-break-before:always"/gi;   # insert proper page-breaks
    $html =~ s/<img align="baseline"/<img/gi;                         # remove the troublesome baseline keyword
    $html =~ s/(<BR \/><BR \/>)+<div align="center"><img/<BR \/><div align="center"><p align="center"><img/gi;  # only allow one <br> before an image to avoid after a page-break   
    $html =~ s/<div align="center"><img/<div align="center"><p align="center"><img/gi;                          # kludge to get eBook Publisher to center images   

    #fix up blank lines (unwanted) before page-break
    $html =~ s/((<div([^>])*>(&nbsp;)*<\/div>)*(\s)*(<br([^>])*>)*(\s)*)*<p style="page-break-before/\n<p style="page-break-before/gi;
    $html =~ s/((<br([^>])*>)*(\s)*)*<p style="page-break-before/\n<p style="page-break-before/gi;
                
    $html =~ s/(<p style="page-break-before:always">)*<\/body>/<\/body>/gi;  #fix up last (unwanted) page-break
    $html =~ s/((<br([^>])*>)*(\s)*)*<\/body>/\n<\/body>/gi;                 #fix up blank lines (unwanted) at end

    $html =~ s/<p/\n<p/gi;                                                   # insert newline before '<p' construct

###################################################################

Last edited by nrapallo; 09-02-2008 at 11:45 AM.
nrapallo is offline   Reply With Quote