View Single Post
Old 07-28-2012, 08:08 AM   #2
theducks
Grand Sorcerer
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 15,244
Karma: 6020307
Join Date: Aug 2009
Location: (The original) Silicon Valley, USA
Device: Galaxy Tab 2, Astak Pocket Pro, K4NT
Quote:
Originally Posted by worley View Post
Hello guys,

Forgive my ignorance, but I need someone's help please. I don't know anything about html. I'm trying to remove the headers from a PDF ebook. I used the following expression to remove the header from page 6, which is the page where the headers begin.

<br> <hr> <A name=6></a>6 <b>•</b> J o ã o G u i m a r ã e s R o s a<br>

The Expression above obviously only found 1 match in the file. The following one (marked in red);

Utilizamos ainda outras edi-<br>ções tanto para corrigir variações indevidas como para insistir<br>em outras. Essas grafias em desuso podem parecer simplesmen-<br>te uma questão de atualização ortográfica, mas, se essa atualiza-<br>ção já era exigida pela norma quando da publicação dos livros e<br> <hr> <A name=6></a>6 <b>•</b> J o ã o G u i m a r ã e s R o s a<br>de suas várias edições durante a vida do autor, partimos do prin-<br>cípio de que elas são provavelmente intencionais e devem, por-<br>tanto, ser mantidas. Para justificar essa decisão, lembramos aos<br>leitores que as antigas edições da obra de Guimarães Rosa apre-<br>sentavam uma nota alertando justamente para a grafia persona-<br>líssima do autor e que algumas histórias registram a sua teimosia<br>em acentuar determinadas palavras.

How on earth do I make it match every page, there are 608 pages? I'm sure it should be easy, but I become dyslexic when dealing with html. Again, I would appreciate someone's help! Thanks!

Probably the '6' is a page number (and there is only 1 @ 6 )

The REGEX wildcard for (any quantity of sequential) Numbers is \d+
Code:
<br> <hr> <A name=\d+></a>\d+ <b>•</b> J o ã o G u i m a r ã e s R o s a<br>
What looks odd to me is this part: <A name=6>, The part after the = should normally be in quotes AND to be valid if it was in a EPUB, start with at least a letter (can't be just numbers)
theducks is offline   Reply With Quote