Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 09-09-2010, 05:38 AM   #1
neonbible
Addict
neonbible is clearly one to watchneonbible is clearly one to watchneonbible is clearly one to watchneonbible is clearly one to watchneonbible is clearly one to watchneonbible is clearly one to watchneonbible is clearly one to watchneonbible is clearly one to watchneonbible is clearly one to watchneonbible is clearly one to watchneonbible is clearly one to watch
 
Posts: 202
Karma: 10802
Join Date: Sep 2010
Device: Kindle Paperwhite, iPhone 5, iPad Air, Nexus 7
Regex help to remove HTML footer

This is the HTML code:

Code:
<br clear="all"/><hr/><div class="center"><small><a href="slide19.html">previous</a> |
<a href="toc.html">Table of Contents</a> |
<a href="slide21.html">next</a></small></div>
</body>
</html>
Now for each page the anchor tags are going to change to point to different links. What expression do I need to use to match it for each page?
neonbible is offline   Reply With Quote
Old 09-09-2010, 05:51 AM   #2
neonbible
Addict
neonbible is clearly one to watchneonbible is clearly one to watchneonbible is clearly one to watchneonbible is clearly one to watchneonbible is clearly one to watchneonbible is clearly one to watchneonbible is clearly one to watchneonbible is clearly one to watchneonbible is clearly one to watchneonbible is clearly one to watchneonbible is clearly one to watch
 
Posts: 202
Karma: 10802
Join Date: Sep 2010
Device: Kindle Paperwhite, iPhone 5, iPad Air, Nexus 7
Ok I managed to find the answer.

Used .+\

When I test the expression, it highlights the sections correctly. However after the conversion, they are still there! Even if I ticked remove footer.
neonbible is offline   Reply With Quote
Advert
Old 09-09-2010, 08:29 AM   #3
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Not sure what you're trying to remove from that html code. Are you saying every page has 'previous', 'table of contents', and 'next' links?

.+\ should only be .+, but try [^>]* because it isn't greedy. You also need to account for variable spacing across line breaks and between tags, \s* helps for that. If some of the parts don't occur every time then surround it with parentheses - e.g. "(<br[^>]*>)" and add a question mark to make it optional - "(<br[^>]*>)?"

Try something like this:
Code:
<br[^>]*>\s*<hr/>\s*<div[^>]*>\s*<small>\s*<a\shref[^>]*>\s*previous\s*</a>\s*\|\s*<a\shref[^>]*>\s*Table\sof\sContents\s*</a>\s*\|\s*<a\shref[^>]*>\s*next\s*</a>\s*</small>\s*</div>
ldolse is offline   Reply With Quote
Old 09-09-2010, 08:38 AM   #4
jackie_w
Grand Sorcerer
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 6,212
Karma: 16534894
Join Date: Sep 2009
Location: UK
Device: Kobo: KA1, ClaraHD, Forma, Libra2, Clara2E. PocketBook: TouchHD3
As your source is HTML, if all else fails, you could always try editing the HTML in a text editor before importing to Calibre.

For example, Notepad++ is a very good free text editor, it supports Regex and allows you to find/replace across multiple open files in one hit.
jackie_w is offline   Reply With Quote
Old 09-09-2010, 09:42 AM   #5
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,800
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by neonbible View Post
This is the HTML code:

Code:
<br clear="all"/><hr/><div class="center"><small><a href="slide19.html">previous</a> |
<a href="toc.html">Table of Contents</a> |
<a href="slide21.html">next</a></small></div>
</body>
</html>
Now for each page the anchor tags are going to change to point to different links. What expression do I need to use to match it for each page?
You may need to replace the digits in the strings "slide##" with wild cards, as they are UNIQUE for each each "Next" and "Previous" text
theducks is offline   Reply With Quote
Advert
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Structure Detection - Remove Header (or Footer) Regex DarkKipper Conversion 69 11-09-2013 12:21 PM
Regex to remove header from PDF neonbible Calibre 4 09-07-2010 10:08 AM
Remove Footer cdecaf Calibre 44 07-21-2010 05:48 AM
remove PDF footer containing variable? irisclara Calibre 10 03-06-2010 10:53 PM
Multiline Regex Footer hover Calibre 10 02-03-2010 04:23 AM


All times are GMT -4. The time now is 02:19 AM.


MobileRead.com is a privately owned, operated and funded community.