MobileRead Forums

MobileRead Forums (https://www.mobileread.com/forums/index.php)
-   Calibre (https://www.mobileread.com/forums/forumdisplay.php?f=166)
-   -   Regex help to remove HTML footer (https://www.mobileread.com/forums/showthread.php?t=97976)

neonbible 09-09-2010 06:38 AM

Regex help to remove HTML footer
 
This is the HTML code:

Code:

<br clear="all"/><hr/><div class="center"><small><a href="slide19.html">previous</a> |
<a href="toc.html">Table of Contents</a> |
<a href="slide21.html">next</a></small></div>
</body>
</html>

Now for each page the anchor tags are going to change to point to different links. What expression do I need to use to match it for each page?

neonbible 09-09-2010 06:51 AM

Ok I managed to find the answer.

Used .+\

When I test the expression, it highlights the sections correctly. However after the conversion, they are still there! Even if I ticked remove footer.

ldolse 09-09-2010 09:29 AM

Not sure what you're trying to remove from that html code. Are you saying every page has 'previous', 'table of contents', and 'next' links?

.+\ should only be .+, but try [^>]* because it isn't greedy. You also need to account for variable spacing across line breaks and between tags, \s* helps for that. If some of the parts don't occur every time then surround it with parentheses - e.g. "(<br[^>]*>)" and add a question mark to make it optional - "(<br[^>]*>)?"

Try something like this:
Code:

<br[^>]*>\s*<hr/>\s*<div[^>]*>\s*<small>\s*<a\shref[^>]*>\s*previous\s*</a>\s*\|\s*<a\shref[^>]*>\s*Table\sof\sContents\s*</a>\s*\|\s*<a\shref[^>]*>\s*next\s*</a>\s*</small>\s*</div>

jackie_w 09-09-2010 09:38 AM

As your source is HTML, if all else fails, you could always try editing the HTML in a text editor before importing to Calibre.

For example, Notepad++ is a very good free text editor, it supports Regex and allows you to find/replace across multiple open files in one hit.

theducks 09-09-2010 10:42 AM

Quote:

Originally Posted by neonbible (Post 1101175)
This is the HTML code:

Code:

<br clear="all"/><hr/><div class="center"><small><a href="slide19.html">previous</a> |
<a href="toc.html">Table of Contents</a> |
<a href="slide21.html">next</a></small></div>
</body>
</html>

Now for each page the anchor tags are going to change to point to different links. What expression do I need to use to match it for each page?

You may need to replace the digits in the strings "slide##" with wild cards, as they are UNIQUE for each each "Next" and "Previous" text :thumbsup:


All times are GMT -4. The time now is 10:50 PM.

Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.