Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 09-25-2010, 12:34 AM   #1
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Is it just me or are others experiencing this too?

Alright, for the life of me I don't know what the deal is with my "test environment". I reported this as a potential bug but Kovid stated that it wasn't that I just needed to add more remove_tags and such. Well Yeah
The issue at hand is still this. If i take and run ebook-convert test1.recipe out_put_dir --test -vv > myrecipe.txt

Yes, I know it only gets too feeds. but the problem is in "most not all but most cases" it cleans the dang junk up. So when i run the test after using remove_tags then I see no junk I automatically assume that it is fine. But that is simply not the case. For instance the popular science code that I submitted for 7.20 I was led to believe it worked fine because the console test looked perfect. yet when i look run the built in recipe in 7.20 i get junk. I ran the recipe without --test and the same exact article(s) that showed perfect when using the --test switch show junk.

So what is the suggestion? To not use --test and run full feed parsing? The test just makes it faster to well "test"
here is the code if your interesting in seeing what I mean. I mean yeah i realize that it needs more remove_tags and keep_only tags but that is not the issue. The issue is why does the same article from the same feed with the same code show two totally different results when i use --test vs not using --test ?
Spoiler:

Code:
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup, re

class AdvancedUserRecipe1282101454(BasicNewsRecipe):
    title = 'Popular Science'
    language = 'en'
    __author__ = 'TonytheBookworm'
    description = 'Popular Science'
    publisher = 'Popular Science'
    category = 'gadgets,science'
    oldest_article = 7 # change this if you want more current articles. I like to go a week in
    max_articles_per_feed = 100
    no_stylesheets = True
    remove_javascript = True
    
    masthead_url = 'http://www.raytheon.com/newsroom/rtnwcm/groups/Public/documents/masthead/rtn08_popscidec_masthead.jpg'
    
    remove_tags = [dict(name='div', attrs={'id':['toolbar','main_supplements']}),
                   dict(name='span', attrs={'class':['comments']}),
                   dict(name='div', attrs={'class':['relatedinfo related-right','node_navigation','content2']}),
                   dict(name='ul', attrs={'class':['item-list clear-block']})]                     
    feeds          = [
                      
                      ('Gadgets', 'http://www.popsci.com/full-feed/gadgets'),
                      ('Cars', 'http://www.popsci.com/full-feed/cars'),
                      ('Science', 'http://www.popsci.com/full-feed/science'),
                      ('Technology', 'http://www.popsci.com/full-feed/technology'),
                      ('DIY', 'http://www.popsci.com/full-feed/diy'),
                      
                    ]

    
 #The following will get read of the Gallery: links when found    
        
    def preprocess_html(self, soup) :
        print 'SOUP IS: ', soup
        weblinks = soup.findAll(['head','h2'])
        if weblinks is not None:
            for link in weblinks:
                if re.search('(Gallery)(:)',str(link)):
                  
                  link.parent.extract()
        return soup
  #-----------------------------------------------------------------


Kovid: I will submit an update to you soon for Popular Science as well. Seems like both of my submissions for 7.20 were no go for launch That's what i get for assuming all was fine when using the --test switch
TonytheBookworm is offline   Reply With Quote
Old 09-25-2010, 12:51 AM   #2
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Someone please test this code on your end and see if you get any junk. I don't want to keep submitting what I believe to be working code to Kovid and then turning around looking like a moron when it ends up looking like crap

Thanks....

Spoiler:

Code:
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import BeautifulSoup, re

class AdvancedUserRecipe1282101454(BasicNewsRecipe):
    title = 'Popular Science'
    language = 'en'
    __author__ = 'TonytheBookworm'
    description = 'Popular Science'
    publisher = 'Popular Science'
    category = 'gadgets,science'
    oldest_article = 7 # change this if you want more current articles. I like to go a week in
    max_articles_per_feed = 100
    no_stylesheets = True
    remove_javascript = True
    use_embedded_content = True
    
    masthead_url = 'http://www.raytheon.com/newsroom/rtnwcm/groups/Public/documents/masthead/rtn08_popscidec_masthead.jpg'
    
               
    feeds          = [
                      
                      ('Gadgets', 'http://www.popsci.com/full-feed/gadgets'),
                      ('Cars', 'http://www.popsci.com/full-feed/cars'),
                      ('Science', 'http://www.popsci.com/full-feed/science'),
                      ('Technology', 'http://www.popsci.com/full-feed/technology'),
                      ('DIY', 'http://www.popsci.com/full-feed/diy'),
                      
                    ]

    
 #The following will get read of the Gallery: links when found    
        
    def preprocess_html(self, soup) :
        print 'SOUP IS: ', soup
        weblinks = soup.findAll(['head','h2'])
        if weblinks is not None:
            for link in weblinks:
                if re.search('(Gallery)(:)',str(link)):
                  
                  link.parent.extract()
        return soup
  #-----------------------------------------------------------------


***Starson17 - I used the use_embedded_content flag that i didn't know anything about until you mentioned it. Makes some feeds a looooot easier. Thanks
TonytheBookworm is offline   Reply With Quote
Advert
Old 09-25-2010, 10:38 AM   #3
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by TonytheBookworm View Post
***Starson17 - I used the use_embedded_content flag that i didn't know anything about until you mentioned it. Makes some feeds a looooot easier. Thanks
I'm glad it helped.
Starson17 is offline   Reply With Quote
Old 09-25-2010, 10:53 AM   #4
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by TonytheBookworm View Post
Someone please test this code on your end and see if you get any junk.
No junk both ways w/ and w/o --test. You'll have to show me exactly what you're seeing for me to look more closely. You're a big boy now, too - track it down - set up to run from code.
Look at lines 595-597 of news.py. Search for "test" in that file.
Starson17 is offline   Reply With Quote
Old 09-25-2010, 11:22 AM   #5
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Here are some snap shots that should explain what I'm seeing. As far as the news.py I tried to find that file without any look. The closes I came by looking in my myrecipe.txt was a file that was suppose to be in
\site-packages\calibre\web\feeds
I went there and the only file that I seen in there was news.py0 I tried to open that file with ultra edit and it is binary..
thanks.
Attached Thumbnails
Click image for larger version

Name:	console line.JPG
Views:	603
Size:	21.2 KB
ID:	58802   Click image for larger version

Name:	clean version when using test switch.JPG
Views:	619
Size:	133.7 KB
ID:	58803   Click image for larger version

Name:	console line without test.JPG
Views:	607
Size:	19.2 KB
ID:	58804   Click image for larger version

Name:	junk when not using test on console.JPG
Views:	624
Size:	108.7 KB
ID:	58805  
TonytheBookworm is offline   Reply With Quote
Advert
Old 09-25-2010, 12:57 PM   #6
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by TonytheBookworm View Post
Here are some snap shots that should explain what I'm seeing.
I'm not seeing any diff between --test and not.
Quote:
As far as the news.py I tried to find that file without any look.
You have to run from source.
http://calibre-ebook.com/user_manual/develop.html
Starson17 is offline   Reply With Quote
Old 09-27-2010, 07:42 AM   #7
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 9,864
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
Quote:
Originally Posted by TonytheBookworm View Post
Someone please test this code on your end and see if you get any junk. I don't want to keep submitting what I believe to be working code to Kovid and then turning around looking like a moron when it ends up looking like crap
I tested this code by loading the recipe in the GUI and running it. I don't see any of the junk you are seeing. It seems to be a nice clean output.
DoctorOhh is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Classic Anybody else experiencing issues since 1.4? _CL Barnes & Noble NOOK 5 08-14-2011 11:02 AM
Anybody else experiencing very frequent crashes? lunixer Calibre 2 08-22-2010 11:40 AM
Anyone else experiencing a "System Error" at Fictionwise Dr. Drib News 13 05-17-2009 11:02 PM
Kindle experiencing same screen problems horseyride Sony Reader 4 12-31-2007 05:30 PM


All times are GMT -4. The time now is 08:24 AM.


MobileRead.com is a privately owned, operated and funded community.