Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 05-24-2011, 04:11 PM   #1
PoP
Happ𝑒 2.7 Hapπ 3.14 day
PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.
 
PoP's Avatar
 
Posts: 442
Karma: 1643361
Join Date: Dec 2010
Location: ♁ ᴺ₄₅°₃₀' ᵂ₇₃°₃₇' ±₆₀"
Device: K3.₄, PRS-350, SGS3, Rπ, iPad Air
Recipe for Cypresse (need help)

I am trying to edit the "Cyberpresse" recipe (authored by balok and Sujata Raman) to remove the text in the red rectangle which appears at the end of every article:
Click image for larger version

Name:	example.jpg
Views:	42
Size:	107.7 KB
ID:	71827

I have succeeded chosing my preferred feeds and in removing "publicité" by adding the class 'pub' to the 'div' removed_tags (modified recipe):
Click image for larger version

Name:	modified recipe.jpg
Views:	39
Size:	223.5 KB
ID:	71825

But I don't know how to remove the "Parta"and "Tweet" (sample source html showing the end of an article):
Click image for larger version

Name:	source html.jpg
Views:	37
Size:	188.6 KB
ID:	71826

Your help would be much appeciated.
PoP is offline   Reply With Quote
Old 06-01-2011, 04:03 PM   #2
PoP
Happ𝑒 2.7 Hapπ 3.14 day
PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.
 
PoP's Avatar
 
Posts: 442
Karma: 1643361
Join Date: Dec 2010
Location: ♁ ᴺ₄₅°₃₀' ᵂ₇₃°₃₇' ±₆₀"
Device: K3.₄, PRS-350, SGS3, Rπ, iPad Air
I solved my own problem! I removed table entries "td" using "remove_tags".

Here is the edited recipe:
Spoiler:
from calibre.web.feeds.news import BasicNewsRecipe

class Cyberpresse(BasicNewsRecipe):

title = u'Cyberpresse'
__author__ = 'balok and Sujata Raman'
description = 'Canadian news in French'
language = 'fr'

oldest_article = 7
max_articles_per_feed = 100
no_stylesheets = True
remove_javascript = True
html2lrf_options = ['--left-margin=0','--right-margin=0','--top-margin=0','--bottom-margin=0']
encoding = 'utf-8'


keep_only_tags = [dict(name='div', attrs={'class':'article-page'}),
dict(name='div', attrs={'id':'articlePage'}),
]

extra_css = '''
.photodata{font-family:Arial,Helvetica,Verdana,sans-serif;color: #999999; font-size: 90%; }
h1{font-family:Georgia,Times,serif ; font-size: large; }
.amorce{font-family:Arial,Helvetica,Verdana,sans-serif; font-weight:bold;}
.article-page{font-family:Arial,Helvetica,Verdana,sans-serif; font-size: x-small;}
#articlePage{font-family:Arial,Helvetica,Verdana,sans-serif; font-size: x-small;}
.auteur{font-family:Georgia,Times,sans-serif; font-size: 90%; color:#006699 ;}
.bodyText{font-family:Arial,Helvetica,Verdana,sans-serif; font-size: x-small;}
.byLine{font-family:Arial,Helvetica,Verdana,sans-serif; font-size: 90%;}
.entry{font-family:Arial,Helvetica,Verdana,sans-serif; font-size: x-small;}
.minithumb-auteurs{font-family:Arial,Helvetica,Verdana,sans-serif; font-size: 90%; }
a{color:#003399; font-weight:bold; }
'''

remove_tags = [
dict(name='div', attrs={'class':['centerbar','colspan','share-module','pub']}),
dict(name='p', attrs={'class':['zoom']}),
dict(name='ul', attrs={'class':['stories']}),
dict(name='h4', attrs={'class':['general-cat']}),
dict(name='td'),
]

feeds = [(u'Manchettes', u'http://www.cyberpresse.ca/rss/225.xml'),
(u'Capitale nationale', u'http://www.cyberpresse.ca/rss/501.xml'),
(u'International', u'http://www.cyberpresse.ca/rss/179.xml'),
(u'Pierre Foglia', u'http://www.cyberpresse.ca/rss/941.xml')
]

def postprocess_html(self, soup, first):

for tag in soup.findAll(name=['i','strong']):
tag.name = 'div'

return soup






It looks ok, the "parta"and "tweet" are gone now but I don't understand if I could have broken anything else in the recipe.
PoP is offline   Reply With Quote
 
Enthusiast
Old 06-01-2011, 04:08 PM   #3
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 24,765
Karma: 4369667
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Removing td will remove everything from every table everywhere. If any article contains a table in its contents, that table will be removed.
kovidgoyal is offline   Reply With Quote
Old 06-01-2011, 04:24 PM   #4
PoP
Happ𝑒 2.7 Hapπ 3.14 day
PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.PoP ought to be getting tired of karma fortunes by now.
 
PoP's Avatar
 
Posts: 442
Karma: 1643361
Join Date: Dec 2010
Location: ♁ ᴺ₄₅°₃₀' ᵂ₇₃°₃₇' ±₆₀"
Device: K3.₄, PRS-350, SGS3, Rπ, iPad Air
Quote:
Originally Posted by kovidgoyal View Post
Removing td will remove everything from every table everywhere. If any article contains a table in its contents, that table will be removed.
I just gambled that tables must be uncommon inside articles... Well, I will keep working at it then, I just don't know enough to specifiy *that* table instead of *all* tables. Thanks.
PoP is offline   Reply With Quote
Old 06-01-2011, 04:32 PM   #5
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 24,765
Karma: 4369667
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
The trick is to look for some attribute on the table of one of its parent tags that is likely to be unique. Ideally an id attribute but if not a class or style.
kovidgoyal is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Recipe works when mocked up as Python file, fails when converted to Recipe ode Recipes 7 09-04-2011 04:57 AM
Recipe help kool Recipes 3 10-22-2010 03:34 PM
New recipe kiklop74 Recipes 0 10-05-2010 04:41 PM
New recipe kiklop74 Recipes 0 10-01-2010 02:42 PM
Recipe Help hellonewman Calibre 1 01-23-2010 03:45 AM


All times are GMT -4. The time now is 06:04 AM.


MobileRead.com is a privately owned, operated and funded community.