Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 09-25-2010, 09:49 AM   #1
JohnsonZA
Member
JohnsonZA began at the beginning.
 
Posts: 10
Karma: 12
Join Date: Sep 2010
Device: Kindle 3 3G+Wifi
keeping or removing a div with multiple classes

I'm using keep_only_tags and remove_tags in a recipe for a site using divs with multiple classes like so:
Code:
<div class="article right">blah</div>
I'm struggling to keep the div based on its "article" class.

This is what I've tried, but it doesn't work:
Code:
keep_only_tags = [
    dict(name='div', attrs={'class':['article']})
]
It only works if I put the both class names like so:
Code:
keep_only_tags = [
    dict(name='div', attrs={'class':['article right']})
]
I've tried wildcards ('article.*'), but that doesn't seem to work either.

Any one have ideas?
JohnsonZA is offline   Reply With Quote
Old 09-25-2010, 10:33 AM   #2
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by JohnsonZA View Post
I'm using keep_only_tags and remove_tags in a recipe for a site using divs with multiple classes like so:
Code:
<div class="article right">blah</div>
I'm struggling to keep the div based on its "article" class.

This is what I've tried, but it doesn't work:
Code:
keep_only_tags = [
    dict(name='div', attrs={'class':['article']})
]
It only works if I put the both class names like so:
Code:
keep_only_tags = [
    dict(name='div', attrs={'class':['article right']})
]
I've tried wildcards ('article.*'), but that doesn't seem to work either.

Any one have ideas?
The class is "article right," so it's working as it should. If you want to use a regex on the class:
Code:
    keep_only_tags = [
                dict(name='div', attrs={'class':re.compile(r'article', re.DOTALL|re.IGNORECASE)}) 
                ]
and don't forget to use:
import re
Starson17 is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Does keeping multiple formats affect performance? ficbot Calibre 3 09-18-2010 10:14 PM
Questions about <p> and <div> and paragraph spacing droople Sigil 7 08-14-2010 12:03 PM
ePub not supported < div > position: absolute samsgates ePub 1 06-18-2010 11:22 AM
Exclude <div>s from processing in HTML2LRF MTBSJC Calibre 1 02-13-2009 03:26 AM
Keeping Multiple Favorites Lists in Synch Bob Russell Lounge 9 08-10-2004 03:06 AM


All times are GMT -4. The time now is 10:24 PM.


MobileRead.com is a privately owned, operated and funded community.