Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 09-09-2010, 05:38 AM   #1
neonbible
Groupie
neonbible is clearly one to watchneonbible is clearly one to watchneonbible is clearly one to watchneonbible is clearly one to watchneonbible is clearly one to watchneonbible is clearly one to watchneonbible is clearly one to watchneonbible is clearly one to watchneonbible is clearly one to watchneonbible is clearly one to watchneonbible is clearly one to watch
 
Posts: 195
Karma: 10802
Join Date: Sep 2010
Device: Kindle 3, iPhone 5, New iPad, Paperwhite
Regex help to remove HTML footer

This is the HTML code:

Code:
<br clear="all"/><hr/><div class="center"><small><a href="slide19.html">previous</a> |
<a href="toc.html">Table of Contents</a> |
<a href="slide21.html">next</a></small></div>
</body>
</html>
Now for each page the anchor tags are going to change to point to different links. What expression do I need to use to match it for each page?
neonbible is offline   Reply With Quote
Old 09-09-2010, 05:51 AM   #2
neonbible
Groupie
neonbible is clearly one to watchneonbible is clearly one to watchneonbible is clearly one to watchneonbible is clearly one to watchneonbible is clearly one to watchneonbible is clearly one to watchneonbible is clearly one to watchneonbible is clearly one to watchneonbible is clearly one to watchneonbible is clearly one to watchneonbible is clearly one to watch
 
Posts: 195
Karma: 10802
Join Date: Sep 2010
Device: Kindle 3, iPhone 5, New iPad, Paperwhite
Ok I managed to find the answer.

Used .+\

When I test the expression, it highlights the sections correctly. However after the conversion, they are still there! Even if I ticked remove footer.
neonbible is offline   Reply With Quote
Old 09-09-2010, 08:29 AM   #3
ldolse
Wizard
ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.ldolse is an accomplished Snipe hunter.
 
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
Not sure what you're trying to remove from that html code. Are you saying every page has 'previous', 'table of contents', and 'next' links?

.+\ should only be .+, but try [^>]* because it isn't greedy. You also need to account for variable spacing across line breaks and between tags, \s* helps for that. If some of the parts don't occur every time then surround it with parentheses - e.g. "(<br[^>]*>)" and add a question mark to make it optional - "(<br[^>]*>)?"

Try something like this:
Code:
<br[^>]*>\s*<hr/>\s*<div[^>]*>\s*<small>\s*<a\shref[^>]*>\s*previous\s*</a>\s*\|\s*<a\shref[^>]*>\s*Table\sof\sContents\s*</a>\s*\|\s*<a\shref[^>]*>\s*next\s*</a>\s*</small>\s*</div>
ldolse is offline   Reply With Quote
Old 09-09-2010, 08:38 AM   #4
jackie_w
Wizard
jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.jackie_w ought to be getting tired of karma fortunes by now.
 
Posts: 2,726
Karma: 3973167
Join Date: Sep 2009
Location: UK
Device: Sony PRS-350, PB360, Kobo Glo/AuraHD/Aura6"
As your source is HTML, if all else fails, you could always try editing the HTML in a text editor before importing to Calibre.

For example, Notepad++ is a very good free text editor, it supports Regex and allows you to find/replace across multiple open files in one hit.
jackie_w is offline   Reply With Quote
Old 09-09-2010, 09:42 AM   #5
theducks
Grand Sorcerer
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 14,479
Karma: 5567061
Join Date: Aug 2009
Location: The (original) Silicon Valley, USA
Device: Galaxy Tab 2, Astak Pocket Pro, K4NT
Quote:
Originally Posted by neonbible View Post
This is the HTML code:

Code:
<br clear="all"/><hr/><div class="center"><small><a href="slide19.html">previous</a> |
<a href="toc.html">Table of Contents</a> |
<a href="slide21.html">next</a></small></div>
</body>
</html>
Now for each page the anchor tags are going to change to point to different links. What expression do I need to use to match it for each page?
You may need to replace the digits in the strings "slide##" with wild cards, as they are UNIQUE for each each "Next" and "Previous" text
theducks is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Structure Detection - Remove Header (or Footer) Regex DarkKipper Conversion 69 11-09-2013 12:21 PM
Regex to remove header from PDF neonbible Calibre 4 09-07-2010 10:08 AM
Remove Footer cdecaf Calibre 44 07-21-2010 05:48 AM
remove PDF footer containing variable? irisclara Calibre 10 03-06-2010 10:53 PM
Multiline Regex Footer hover Calibre 10 02-03-2010 04:23 AM


All times are GMT -4. The time now is 01:37 AM.


MobileRead.com is a privately owned, operated and funded community.