View Single Post
Old 10-14-2013, 10:25 PM   #1
damien18
Junior Member
damien18 began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Oct 2013
Device: Marvin 3 (iPadOS)
Cracked.com recipe skips "blog" articles

First, a big thanks to Kovid for Calibre, without which using my e-reader would be a much less pleasurable experience.

1. The Cracked.com recipe was modified a few months ago to accommodate a change to the site that was causing only the first page of multi-page articles to be downloaded (and lots of junk to be included as well). However, since then some articles are no longer being downloaded at all. While I know next to nothing about recipes, I was able to fix the issue by replacing this:

Code:
keep_only_tags = dict(name='article', attrs={
                      'class': 'module article dropShadowBottomCurved'})
With this:

Code:
keep_only_tags = [dict(name='article', attrs={'class': 'module article dropShadowBottomCurved'}),
                  dict(name='article', attrs={'class': 'module blog dropShadowBottomCurved'})]
The items that have a "blog" class in place of an "article" class were being downloaded by the old version of the recipe, so I was wondering if it would be possible to include this change in the built-in recipe.


2. One other minor thing I noticed is that section tags having a "socialTools" class are no longer being removed, even though the recipe includes this:

Code:
remove_tags = [dict(name='section', attrs={'class': ['socialTools', 'quickFixModule']})]
This one I'm not sure about, but I was wondering if it is perhaps because remove_tags does not remove tags that are children of tags being kept by keep_only_tags? This is just a guess based on the fact that the "quickFixModule" section is being removed, and it is a sibling of the tag being kept by keep_only_tags.


Thanks in advance.
damien18 is offline   Reply With Quote