Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 10-16-2010, 07:13 PM   #1
noah
Junior Member
noah began at the beginning.
 
Posts: 6
Karma: 10
Join Date: Sep 2010
Device: Kindle
Question Some articles are corrupted - just garbage characters

I am using the following recipe to get The Bay Citizen:

Spoiler:
Code:
# this block is pretty much standard on all recipes
#----------------------------------------------------------------------------------------------------------
from calibre.web.feeds.news import BasicNewsRecipe

class AdvancedUserRecipe1282101454(BasicNewsRecipe):
    title = 'The Bay Citizen'
    language = 'en'
    __author__ = 'TonytheBookworm and noah'
    description = 'The Bay Citizen'
    publisher = 'The Bay Citizen'
    category = 'news'
    oldest_article = 2 # USE THIS TO DETERMINE HOW FAR BACK YOU WANNA GO IN THE FEED DATE WISE
    max_articles_per_feed = 20 # USE TO DETERMINE HOW MANY ARTICLES YOU WISH TO READ PER FEED
    no_stylesheets = True # TURNS OFF JAVASCRIPT
      
    masthead_url = 'http://media.baycitizen.org/images/layout/logo1.png' #PUTS NICE LOGO ON KINDLE
#---------------------------------------------------------------------------------------------------------    
    
    #here we tell the recipe what feed(s) we wish to obtain
    #-----------------------------------------------------------------------------------------
    feeds          = [
                      ('Main Feed', 'http://www.baycitizen.org/feeds/stories/'),
                      
                    ]
    #------------------------------------------------------------------------------------------


    keep_only_tags    = [dict(name='div', attrs={'class':'story'})]

    remove_tags    = [dict(name='div', attrs={'class':'socialBar'})]


It is mostly working. However, most downloads include one or two articles which get completely corrupted - they display as garbage characters. If I run the recipe twice in a row, fetching the same set of articles, different articles may come through corrupted.

Example: See the attachments: two downloads of The Bay Citizen, done within minutes of one another. In V1, the 5th article ("Jane Kim Leaflet Screencap") is corrupted; all the other articles are fine. In V2, downloaded a few minutes later, the 5th article is fine but the 2nd article ("Reuben Santos’ official military photo") is corrupted.

Can anyone help me figure out why this is happening?

- thanks.
noah is offline   Reply With Quote
Old 10-19-2010, 06:26 PM   #2
lordvetinari2
Zealot
lordvetinari2 is on a distinguished road
 
Posts: 137
Karma: 61
Join Date: Jun 2006
Location: Gijón, Spain
Device: Kindle 3G+WiFi & Galaxy Note
That looks like binary data corrupted to text. In my case, I have a recipe for a newspaper that inserts ads as news in the feeds. Whenever Calibre finds an ad, it corrupts the next article just like in your samples.

I have checked the RSS feed for yours, but found no ads posing as articles. By clicking on some +10 articles from your feeds, I was not redirected to ads, either. So I'm not sure why binary data is popping up here and there in your case. Sorry I cannot help.
lordvetinari2 is offline   Reply With Quote
Advert
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
pdf to epub results in 'garbage'? wulfie Calibre 6 09-23-2010 08:01 AM
The screen have imagen garbage amungar Amazon Kindle 9 09-06-2010 10:46 PM
Title comes out as garbage in content list! thinkpad Amazon Kindle 3 05-04-2010 09:05 AM
Garbage in Garbage out Novasea Workshop 3 10-23-2008 12:27 AM
Garbage characters in gutenburg books ylsul Sony Reader 3 04-25-2007 02:09 PM


All times are GMT -4. The time now is 12:34 AM.


MobileRead.com is a privately owned, operated and funded community.