Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 12-19-2010, 10:37 AM   #1
mufc
Connoisseur
mufc doesn't littermufc doesn't litter
 
Posts: 99
Karma: 170
Join Date: Nov 2010
Location: Airdrie Alberta
Device: Sony 650
'Heading color' and 'p class span'

I have tried
remove_attributes = ['style', 'font','font color']
and
remove_attributes = ['style', 'font','color']
when trying to get rid of color in h with no luck.
h2><font color="#33cccc">WHEN SHOULD I SEE A DOCTOR? </font><br></h2>

Also I cannot remove span name through the usual channels

<span name="KonaFilter">

dict(name='span', attrs={'name':['KonaFilter']}),

also no luck with p class span either.

Any ideas.
mufc is offline   Reply With Quote
Old 12-20-2010, 09:54 AM   #2
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by mufc View Post
Any ideas.
You can always brute force it with a regex search and replace.
Starson17 is offline   Reply With Quote
Advert
Old 12-20-2010, 08:28 PM   #3
mufc
Connoisseur
mufc doesn't littermufc doesn't litter
 
Posts: 99
Karma: 170
Join Date: Nov 2010
Location: Airdrie Alberta
Device: Sony 650
Hope I can find somewhere to learn that
mufc is offline   Reply With Quote
Old 12-21-2010, 03:11 PM   #4
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by mufc View Post
Hope I can find somewhere to learn that
Here's an example from a recipe:
Code:
    preprocess_regexps = [
        (re.compile(r'<body.*?<div class="pad_10L10R">', re.DOTALL|re.IGNORECASE), lambda match: '<body><div>'),
        (re.compile(r'</div>.*</body>', re.DOTALL|re.IGNORECASE), lambda match: '</div></body>'),
        (re.compile('\r'),lambda match: ''),
        (re.compile(r'<!-- .+? -->', re.DOTALL|re.IGNORECASE), lambda match: ''),
        (re.compile(r'<link .+?>', re.DOTALL|re.IGNORECASE), lambda match: ''),
        (re.compile(r'<script.*?</script>', re.DOTALL|re.IGNORECASE), lambda match: ''),
        (re.compile(r'<noscript.*?</noscript>', re.DOTALL|re.IGNORECASE), lambda match: ''),
        (re.compile(r'<meta .*?/>', re.DOTALL|re.IGNORECASE), lambda match: ''),
    ]
In the first one, he's deleting class="pad_10L10R" from a div and stuff in the <body> tag before that div. In the second he's deleting stuff in the body tag after the div closes. The others just delete things. Brute force regex with preprocess_regexps is the last resort, but it works great when you need it. Just be careful not to delete partial tags. If you delete the open tag, delete the closing part of that tag, too.
Starson17 is offline   Reply With Quote
Old 12-21-2010, 07:01 PM   #5
mufc
Connoisseur
mufc doesn't littermufc doesn't litter
 
Posts: 99
Karma: 170
Join Date: Nov 2010
Location: Airdrie Alberta
Device: Sony 650
Thanks I will look in to that as soon as I can
mufc is offline   Reply With Quote
Advert
Old 12-21-2010, 11:48 PM   #6
mufc
Connoisseur
mufc doesn't littermufc doesn't litter
 
Posts: 99
Karma: 170
Join Date: Nov 2010
Location: Airdrie Alberta
Device: Sony 650
Missing Something

When I do this I get rid of articles with Video in the title but Gallery is ignored in the url when it clearly is in the url. Not only that but when I tried removing .upper in line 5 it would not work even though Video in my titles only has first letter Upper Case. On trying to delete pages with Gallery in the title I added another instance of the def parse feed but reversed where VIDEO and GALLERY are in the recipe and then I got rid of Gallery pages but not Video pages. Seems the 2nd instance overrode the first. Is there any way to combine both VIDEO and GALLERY in one.
Hope I made myself clear. It seems that the first part works fine for me but not the second.
Spoiler:
def parse_feeds (self):
feeds = BasicNewsRecipe.parse_feeds(self)
for feed in feeds:
for article in feed.articles[:]:
print 'article.title is: ', article.title
if 'VIDEO' in article.title.upper() or 'GALLERY' in article.url:
feed.articles.remove(article)
return feeds
mufc is offline   Reply With Quote
Old 12-22-2010, 08:51 AM   #7
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by mufc View Post
On trying to delete pages with Gallery in the title I added another instance of the def parse feed but reversed where VIDEO and GALLERY are in the recipe and then I got rid of Gallery pages but not Video pages.
You can't define parse_feeds twice.

I didn't closely follow what worked for you, but if they worked separately, you can just run them separately with an elif:
Code:
          if 'VIDEO' in article.title.upper():
            feed.articles.remove(article)
          elif 'GALLERY' in article.url.upper():
            feed.articles.remove(article)
      return feeds
Starson17 is offline   Reply With Quote
Old 12-22-2010, 09:02 PM   #8
mufc
Connoisseur
mufc doesn't littermufc doesn't litter
 
Posts: 99
Karma: 170
Join Date: Nov 2010
Location: Airdrie Alberta
Device: Sony 650
Thanks.
I had managed to get the original parse-feed to work putting 'gallery' before 'video'. I could not get it to work the other way around. However I like your idea better in fact in will it be included in my template for the start of a new recipes.
mufc is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
PRS-650 SD Card Importance? SDHC, SDHC Class 4, Class 10 etc is it important Renji Sony Reader 11 12-03-2011 12:30 PM
yet another heading question jhempel24 Sigil 3 11-25-2010 07:58 AM
Span tags, h1s and emspaces ConorHughes ePub 11 09-30-2010 05:00 PM
STREET & CLAIRVOYANCE by Ryan A. Span Winter Self-Promotions by Authors and Publishers 36 09-01-2010 11:09 AM
PRS-500 Span tags in LRS and LRF files -- do I understand them? Falstaff Sony Reader Dev Corner 2 01-31-2007 10:34 AM


All times are GMT -4. The time now is 06:00 PM.


MobileRead.com is a privately owned, operated and funded community.