View Single Post
Old 09-13-2010, 12:35 AM   #2709
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Slightly puzzled and not sure what is going on here.
When I run this recipe at the console with
ebook-convert test.recipe output_dir --test -vv > myrecipe.txt

I end up getting a nice formatted article with no junk.
Then when i take and import it into calibre to fully test it. I get junk.

So, I went a step further and did this.

ebook-convert test.recipe myrecipe.mobi --test

And again I get nice pretty articles with no junk. So what could be going on that is different when I actually load it into calibre ? I can remove the tags but kinda hard to do that when they don't show up in the test

here is the code i'm working with
Spoiler:

Code:
from calibre.web.feeds.news import BasicNewsRecipe

class AdvancedUserRecipe1282101454(BasicNewsRecipe):
    title = 'Popular Science'
    language = 'en'
    __author__ = 'TonytheBookworm'
    description = 'Popular Science'
    publisher = 'Popular Science'
    category = 'gadgets,science'
    oldest_article = 7
    max_articles_per_feed = 100
    no_stylesheets = True
    remove_javascript = True
    #extra_css = '.headline {font-size: x-large;} \n .fact { padding-top: 10pt }'
    #masthead_url = 'http://gawand.org/wp-content/uploads/2010/06/ajc-logo.gif'
    #keep_only_tags    = [
     #                    dict(name='div', attrs={'class':['content']})
      #                 ,dict(attrs={'id':['cxArticleText','cxArticleBodyText']})
      #                  ]
    remove_tags = [dict(name='div', attrs={'id':['main_supplements']})]                     
    feeds          = [
                      
                      ('Gadgets', 'http://www.popsci.com/full-feed/gadgets'),
                      ('Cars', 'http://www.popsci.com/full-feed/cars'),
                      ('Science', 'http://www.popsci.com/full-feed/science'),
                      ('Technology', 'http://www.popsci.com/full-feed/technology'),
                      ('DIY', 'http://www.popsci.com/full-feed/diy'),
                      
                    ]




   # def print_version(self, url):
    #    return url.partition('?')[0] +'?printArticle=y'

TonytheBookworm is offline