Quote:
Originally Posted by nrapallo
When I retrieve a Project Gutenberg ebook in HTML form, I usually leave the page number (href) references in, but remove the actual PG #'s using a RegEx, like the below example written in Perl:
Code:
#Remove page numbering
$html =~ s#<span class='pagenum'><(.*[^>])>.*</span>#<$1>#gi ;
$html =~ s#<span class=\"pagenum\"><(.*[^>])>.*</span>#<$1>#gi ;
It just leaves the <a name/id> reference i.e. <$1>.
|
Yea, but I have hundreds of such references and no html links. For now I want to be able to have the real page numbers. The display:none trick worked well in my sample. I have to check a few different readers. It is reasonable to have the published page numbers for reference purposes.
Dale