![]() |
#1 |
Junior Member
![]() Posts: 7
Karma: 10
Join Date: Oct 2013
Device: Marvin 3 (iPadOS)
|
Cracked.com recipe skips "blog" articles
First, a big thanks to Kovid for Calibre, without which using my e-reader would be a much less pleasurable experience.
1. The Cracked.com recipe was modified a few months ago to accommodate a change to the site that was causing only the first page of multi-page articles to be downloaded (and lots of junk to be included as well). However, since then some articles are no longer being downloaded at all. While I know next to nothing about recipes, I was able to fix the issue by replacing this: Code:
keep_only_tags = dict(name='article', attrs={ 'class': 'module article dropShadowBottomCurved'}) Code:
keep_only_tags = [dict(name='article', attrs={'class': 'module article dropShadowBottomCurved'}), dict(name='article', attrs={'class': 'module blog dropShadowBottomCurved'})] 2. One other minor thing I noticed is that section tags having a "socialTools" class are no longer being removed, even though the recipe includes this: Code:
remove_tags = [dict(name='section', attrs={'class': ['socialTools', 'quickFixModule']})] Thanks in advance. |
![]() |
![]() |
![]() |
#2 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,157
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
remove_tags runs after keep_tags so it does not matter where the tag is located. Most likely the tag in question has more than one class, in which case you need
dict(name='section', attrs={'class': lambda x: x and 'socialTools' in x.split()}) |
![]() |
![]() |
![]() |
#3 |
Junior Member
![]() Posts: 7
Karma: 10
Join Date: Oct 2013
Device: Marvin 3 (iPadOS)
|
Thanks for the feedback. The tag in question only has the "socialTools" class.
However, on closer inspection it seems my initial assessment was incomplete. The tag in question is being removed, but only on the first page. It is on the subsequent page(s) that the tag is not being removed. In any case, it's not something that will really bother me, I was just curious as to what was going on there. I'm guessing more sophisticated processing would be necessary to remove the tag from the subsequent pages. Thanks again for your help, and I hope you'll consider my request for modifying the built-in recipe so as to include the "blog" articles. |
![]() |
![]() |
![]() |
#4 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,157
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
It's already done.
|
![]() |
![]() |
![]() |
Tags |
calibre, cracked.com, recipe, recipe update |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Recipe for EPUB subscribers of "Tagesspiegel" and "Handelsblatt"? | F.W. | Recipes | 0 | 05-14-2013 11:16 AM |
New recipe for "Süddeutsche Zeitung" using "E-Paper mobile" subscription | Ernst | Recipes | 3 | 02-16-2013 07:37 AM |
Set the list of words considered to be "articles" for sort strings | lousignolo | Library Management | 8 | 05-13-2012 06:54 PM |
Recipe for "Science Based Medicine" blog | BuzzKill | Recipes | 0 | 12-12-2010 04:18 PM |
Microsoft "Genuine Advantage" cracked | doctorow | Lounge | 7 | 08-06-2005 11:31 AM |