Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Editor

Notices

Reply
 
Thread Tools Search this Thread
Old 08-10-2014, 08:34 AM   #31
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 9,272
Karma: 42298328
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Earlier I mentioned that I was worried that:
Code:
</?a ?([^>]+)?>
might find other tags that started with 'a'--and indeed it does. The addr, abbr, and area tags are probably able to be dismissed, but the <aside> tag is one that I'm sure we're only going to see more and more of. And my regex will include it.

So for the paranoid/pedantic type (like myself), it's probably best to use:
Code:
</?a\b([^>]+)?>
instead (should work in pretty-much all regex flavors).

The \b just matches a "word" boundary so that no other tags that start with 'a' will be caught up in the match.
DiapDealer is offline   Reply With Quote
Old 08-10-2014, 09:13 AM   #32
cybmole
Wizard
cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.cybmole ought to be getting tired of karma fortunes by now.
 
Posts: 2,859
Karma: 1163098
Join Date: Sep 2010
Device: Kobo aura HD, Kobo Arc, Kindle Fire HDX 8.9 , Kindle for PC
Id never heard of <aside> i had to google it. it doesnt seem like anything I'd ever expect to encounter in an epub though - would ADE even have a clue what to do with it?

So I think I'll just go on happily nuking everything beginning with <a !

I have an HTML5 cuick guide on my tablet so lets see what else there is in its index for a:
acronym - though its says that is now unsupported
address
article - new for html 5
cybmole is offline   Reply With Quote
Old 08-10-2014, 09:40 AM   #33
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 9,272
Karma: 42298328
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Fair enough. As I said; my change is more of a "stickler for details" thing.

But if you're cleaning up the cruft from retail epubs, you're going to start running into the <aside> tag sooner rather than later. Both Kobo and B&N (and even Kindle) books are exhibiting more and more html5/epub3 features.

Deleting the <aside></aside> tags and leaving their contents intact could make for some very confusing reading (footnotes and other ancillary details stuffed into the middle of sentences and the like).

Last edited by DiapDealer; 08-10-2014 at 10:22 AM.
DiapDealer is offline   Reply With Quote
Old 08-10-2014, 10:45 AM   #34
theducks
Grand Sorcerer
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 14,863
Karma: 5654321
Join Date: Aug 2009
Location: (The original) Silicon Valley, USA
Device: Galaxy Tab 2, Astak Pocket Pro, K4NT
Quote:
Originally Posted by cybmole View Post
thanks - now I'll have to remember where I might have saved a test case

does that code strip the nested tags from inner to outer, as usually the outermost one is the best candidate for keeping ?
No, it is pretty stupid, when I can, I usually wrap that in a
(<body.*>)\s*
and
\s+</body>

including putting those back in the replace


I do frequent saves, NOT auto fix, and I stop and repair before getting a deep pile. Use the Preview Message to help locate the exception (usually there was a mid document close BQ and another Open BQ.
A tool that could count the additional inner Opening BQ tags and only remove the close after counting back down)
theducks is online now   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Regex Solution to hidden href search? MizSuz Sigil 16 09-29-2012 07:40 PM
Why is a href needed in the manifest to validate? wannabee ePub 3 01-24-2012 11:40 PM
a href links working/not working mimosawind ePub 5 12-09-2011 12:42 PM
What does the filepos parameter do in an href? lunixer ePub 4 08-07-2011 06:19 AM
RFE: Remove remove tags in bulk edit magphil Calibre 0 08-11-2009 10:37 AM


All times are GMT -4. The time now is 10:57 PM.


MobileRead.com is a privately owned, operated and funded community.