Cracked.com recipe skips "blog" articles

damien18 · 10-14-2013, 10:25 PM

First, a big thanks to Kovid for Calibre, without which using my e-reader would be a much less pleasurable experience.

1. The Cracked.com recipe was modified a few months ago to accommodate a change to the site that was causing only the first page of multi-page articles to be downloaded (and lots of junk to be included as well). However, since then some articles are no longer being downloaded at all. While I know next to nothing about recipes, I was able to fix the issue by replacing this:

Code:

keep_only_tags = dict(name='article', attrs={
                      'class': 'module article dropShadowBottomCurved'})

With this:

Code:

keep_only_tags = [dict(name='article', attrs={'class': 'module article dropShadowBottomCurved'}),
                  dict(name='article', attrs={'class': 'module blog dropShadowBottomCurved'})]

The items that have a "blog" class in place of an "article" class were being downloaded by the old version of the recipe, so I was wondering if it would be possible to include this change in the built-in recipe.

2. One other minor thing I noticed is that section tags having a "socialTools" class are no longer being removed, even though the recipe includes this:

Code:

remove_tags = [dict(name='section', attrs={'class': ['socialTools', 'quickFixModule']})]

This one I'm not sure about, but I was wondering if it is perhaps because remove_tags does not remove tags that are children of tags being kept by keep_only_tags? This is just a guess based on the fact that the "quickFixModule" section is being removed, and it is a sibling of the tag being kept by keep_only_tags.

Thanks in advance.

kovidgoyal · 10-14-2013, 11:16 PM

remove_tags runs after keep_tags so it does not matter where the tag is located. Most likely the tag in question has more than one class, in which case you need

dict(name='section', attrs={'class': lambda x: x and 'socialTools' in x.split()})

damien18 · 10-15-2013, 10:03 PM

Thanks for the feedback. The tag in question only has the "socialTools" class.

However, on closer inspection it seems my initial assessment was incomplete. The tag in question is being removed, but only on the first page. It is on the subsequent page(s) that the tag is not being removed.

In any case, it's not something that will really bother me, I was just curious as to what was going on there. I'm guessing more sophisticated processing would be necessary to remove the tag from the subsequent pages.

Thanks again for your help, and I hope you'll consider my request for modifying the built-in recipe so as to include the "blog" articles.

kovidgoyal · 10-15-2013, 10:37 PM

It's already done.

10-14-2013, 10:25 PM	#1
damien18 Junior Member Posts: 7 Karma: 10 Join Date: Oct 2013 Device: Marvin 3 (iPadOS)	Cracked.com recipe skips "blog" articles First, a big thanks to Kovid for Calibre, without which using my e-reader would be a much less pleasurable experience. 1. The Cracked.com recipe was modified a few months ago to accommodate a change to the site that was causing only the first page of multi-page articles to be downloaded (and lots of junk to be included as well). However, since then some articles are no longer being downloaded at all. While I know next to nothing about recipes, I was able to fix the issue by replacing this: Code: keep_only_tags = dict(name='article', attrs={ 'class': 'module article dropShadowBottomCurved'}) With this: Code: keep_only_tags = [dict(name='article', attrs={'class': 'module article dropShadowBottomCurved'}), dict(name='article', attrs={'class': 'module blog dropShadowBottomCurved'})] The items that have a "blog" class in place of an "article" class were being downloaded by the old version of the recipe, so I was wondering if it would be possible to include this change in the built-in recipe. 2. One other minor thing I noticed is that section tags having a "socialTools" class are no longer being removed, even though the recipe includes this: Code: remove_tags = [dict(name='section', attrs={'class': ['socialTools', 'quickFixModule']})] This one I'm not sure about, but I was wondering if it is perhaps because remove_tags does not remove tags that are children of tags being kept by keep_only_tags? This is just a guess based on the fact that the "quickFixModule" section is being removed, and it is a sibling of the tag being kept by keep_only_tags. Thanks in advance.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Recipe for EPUB subscribers of "Tagesspiegel" and "Handelsblatt"?	F.W.	Recipes	0	05-14-2013 11:16 AM
New recipe for "Süddeutsche Zeitung" using "E-Paper mobile" subscription	Ernst	Recipes	3	02-16-2013 07:37 AM
Set the list of words considered to be "articles" for sort strings	lousignolo	Library Management	8	05-13-2012 06:54 PM
Recipe for "Science Based Medicine" blog	BuzzKill	Recipes	0	12-12-2010 04:18 PM
Microsoft "Genuine Advantage" cracked	doctorow	Lounge	7	08-06-2005 11:31 AM

10-14-2013, 11:16 PM	#2
kovidgoyal creator of calibre Posts: 45,157 Karma: 27110894 Join Date: Oct 2006 Location: Mumbai, India Device: Various	remove_tags runs after keep_tags so it does not matter where the tag is located. Most likely the tag in question has more than one class, in which case you need dict(name='section', attrs={'class': lambda x: x and 'socialTools' in x.split()})

10-15-2013, 10:03 PM	#3
damien18 Junior Member Posts: 7 Karma: 10 Join Date: Oct 2013 Device: Marvin 3 (iPadOS)	Thanks for the feedback. The tag in question only has the "socialTools" class. However, on closer inspection it seems my initial assessment was incomplete. The tag in question is being removed, but only on the first page. It is on the subsequent page(s) that the tag is not being removed. In any case, it's not something that will really bother me, I was just curious as to what was going on there. I'm guessing more sophisticated processing would be necessary to remove the tag from the subsequent pages. Thanks again for your help, and I hope you'll consider my request for modifying the built-in recipe so as to include the "blog" articles.

10-15-2013, 10:37 PM	#4
kovidgoyal creator of calibre Posts: 45,157 Karma: 27110894 Join Date: Oct 2006 Location: Mumbai, India Device: Various	It's already done.