View Single Post
Old 10-16-2010, 07:13 PM   #1
noah
Junior Member
noah began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Sep 2010
Device: Kindle
Question Some articles are corrupted - just garbage characters

I am using the following recipe to get The Bay Citizen:

Spoiler:
Code:
# this block is pretty much standard on all recipes
#----------------------------------------------------------------------------------------------------------
from calibre.web.feeds.news import BasicNewsRecipe

class AdvancedUserRecipe1282101454(BasicNewsRecipe):
    title = 'The Bay Citizen'
    language = 'en'
    __author__ = 'TonytheBookworm and noah'
    description = 'The Bay Citizen'
    publisher = 'The Bay Citizen'
    category = 'news'
    oldest_article = 2 # USE THIS TO DETERMINE HOW FAR BACK YOU WANNA GO IN THE FEED DATE WISE
    max_articles_per_feed = 20 # USE TO DETERMINE HOW MANY ARTICLES YOU WISH TO READ PER FEED
    no_stylesheets = True # TURNS OFF JAVASCRIPT
      
    masthead_url = 'http://media.baycitizen.org/images/layout/logo1.png' #PUTS NICE LOGO ON KINDLE
#---------------------------------------------------------------------------------------------------------    
    
    #here we tell the recipe what feed(s) we wish to obtain
    #-----------------------------------------------------------------------------------------
    feeds          = [
                      ('Main Feed', 'http://www.baycitizen.org/feeds/stories/'),
                      
                    ]
    #------------------------------------------------------------------------------------------


    keep_only_tags    = [dict(name='div', attrs={'class':'story'})]

    remove_tags    = [dict(name='div', attrs={'class':'socialBar'})]


It is mostly working. However, most downloads include one or two articles which get completely corrupted - they display as garbage characters. If I run the recipe twice in a row, fetching the same set of articles, different articles may come through corrupted.

Example: See the attachments: two downloads of The Bay Citizen, done within minutes of one another. In V1, the 5th article ("Jane Kim Leaflet Screencap") is corrupted; all the other articles are fine. In V2, downloaded a few minutes later, the 5th article is fine but the 2nd article ("Reuben Santos’ official military photo") is corrupted.

Can anyone help me figure out why this is happening?

- thanks.
noah is offline   Reply With Quote