Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 10-14-2013, 10:25 PM   #1
damien18
Junior Member
damien18 began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Oct 2013
Device: Marvin 3 (iPadOS)
Cracked.com recipe skips "blog" articles

First, a big thanks to Kovid for Calibre, without which using my e-reader would be a much less pleasurable experience.

1. The Cracked.com recipe was modified a few months ago to accommodate a change to the site that was causing only the first page of multi-page articles to be downloaded (and lots of junk to be included as well). However, since then some articles are no longer being downloaded at all. While I know next to nothing about recipes, I was able to fix the issue by replacing this:

Code:
keep_only_tags = dict(name='article', attrs={
                      'class': 'module article dropShadowBottomCurved'})
With this:

Code:
keep_only_tags = [dict(name='article', attrs={'class': 'module article dropShadowBottomCurved'}),
                  dict(name='article', attrs={'class': 'module blog dropShadowBottomCurved'})]
The items that have a "blog" class in place of an "article" class were being downloaded by the old version of the recipe, so I was wondering if it would be possible to include this change in the built-in recipe.


2. One other minor thing I noticed is that section tags having a "socialTools" class are no longer being removed, even though the recipe includes this:

Code:
remove_tags = [dict(name='section', attrs={'class': ['socialTools', 'quickFixModule']})]
This one I'm not sure about, but I was wondering if it is perhaps because remove_tags does not remove tags that are children of tags being kept by keep_only_tags? This is just a guess based on the fact that the "quickFixModule" section is being removed, and it is a sibling of the tag being kept by keep_only_tags.


Thanks in advance.
damien18 is offline   Reply With Quote
Old 10-14-2013, 11:16 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,850
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
remove_tags runs after keep_tags so it does not matter where the tag is located. Most likely the tag in question has more than one class, in which case you need

dict(name='section', attrs={'class': lambda x: x and 'socialTools' in x.split()})
kovidgoyal is offline   Reply With Quote
Advert
Old 10-15-2013, 10:03 PM   #3
damien18
Junior Member
damien18 began at the beginning.
 
Posts: 7
Karma: 10
Join Date: Oct 2013
Device: Marvin 3 (iPadOS)
Thanks for the feedback. The tag in question only has the "socialTools" class.

However, on closer inspection it seems my initial assessment was incomplete. The tag in question is being removed, but only on the first page. It is on the subsequent page(s) that the tag is not being removed.

In any case, it's not something that will really bother me, I was just curious as to what was going on there. I'm guessing more sophisticated processing would be necessary to remove the tag from the subsequent pages.

Thanks again for your help, and I hope you'll consider my request for modifying the built-in recipe so as to include the "blog" articles.
damien18 is offline   Reply With Quote
Old 10-15-2013, 10:37 PM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,850
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
It's already done.
kovidgoyal is offline   Reply With Quote
Reply

Tags
calibre, cracked.com, recipe, recipe update


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Recipe for EPUB subscribers of "Tagesspiegel" and "Handelsblatt"? F.W. Recipes 0 05-14-2013 11:16 AM
New recipe for "Süddeutsche Zeitung" using "E-Paper mobile" subscription Ernst Recipes 3 02-16-2013 07:37 AM
Set the list of words considered to be "articles" for sort strings lousignolo Library Management 8 05-13-2012 06:54 PM
Recipe for "Science Based Medicine" blog BuzzKill Recipes 0 12-12-2010 04:18 PM
Microsoft "Genuine Advantage" cracked doctorow Lounge 7 08-06-2005 11:31 AM


All times are GMT -4. The time now is 09:09 AM.


MobileRead.com is a privately owned, operated and funded community.