09-09-2010, 05:38 AM | #1 |
Addict
Posts: 202
Karma: 10802
Join Date: Sep 2010
Device: Kindle Paperwhite, iPhone 5, iPad Air, Nexus 7
|
Regex help to remove HTML footer
This is the HTML code:
Code:
<br clear="all"/><hr/><div class="center"><small><a href="slide19.html">previous</a> | <a href="toc.html">Table of Contents</a> | <a href="slide21.html">next</a></small></div> </body> </html> |
09-09-2010, 05:51 AM | #2 |
Addict
Posts: 202
Karma: 10802
Join Date: Sep 2010
Device: Kindle Paperwhite, iPhone 5, iPad Air, Nexus 7
|
Ok I managed to find the answer.
Used .+\ When I test the expression, it highlights the sections correctly. However after the conversion, they are still there! Even if I ticked remove footer. |
Advert | |
|
09-09-2010, 08:29 AM | #3 |
Wizard
Posts: 1,337
Karma: 123455
Join Date: Apr 2009
Location: Malaysia
Device: PRS-650, iPhone
|
Not sure what you're trying to remove from that html code. Are you saying every page has 'previous', 'table of contents', and 'next' links?
.+\ should only be .+, but try [^>]* because it isn't greedy. You also need to account for variable spacing across line breaks and between tags, \s* helps for that. If some of the parts don't occur every time then surround it with parentheses - e.g. "(<br[^>]*>)" and add a question mark to make it optional - "(<br[^>]*>)?" Try something like this: Code:
<br[^>]*>\s*<hr/>\s*<div[^>]*>\s*<small>\s*<a\shref[^>]*>\s*previous\s*</a>\s*\|\s*<a\shref[^>]*>\s*Table\sof\sContents\s*</a>\s*\|\s*<a\shref[^>]*>\s*next\s*</a>\s*</small>\s*</div> |
09-09-2010, 08:38 AM | #4 |
Grand Sorcerer
Posts: 6,216
Karma: 16534894
Join Date: Sep 2009
Location: UK
Device: Kobo: KA1, ClaraHD, Forma, Libra2, Clara2E. PocketBook: TouchHD3
|
As your source is HTML, if all else fails, you could always try editing the HTML in a text editor before importing to Calibre.
For example, Notepad++ is a very good free text editor, it supports Regex and allows you to find/replace across multiple open files in one hit. |
09-09-2010, 09:42 AM | #5 | |
Well trained by Cats
Posts: 30,365
Karma: 58053698
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
Quote:
|
|
Advert | |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Structure Detection - Remove Header (or Footer) Regex | DarkKipper | Conversion | 69 | 11-09-2013 12:21 PM |
Regex to remove header from PDF | neonbible | Calibre | 4 | 09-07-2010 10:08 AM |
Remove Footer | cdecaf | Calibre | 44 | 07-21-2010 05:48 AM |
remove PDF footer containing variable? | irisclara | Calibre | 10 | 03-06-2010 10:53 PM |
Multiline Regex Footer | hover | Calibre | 10 | 02-03-2010 04:23 AM |