Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 10-08-2010, 03:05 PM   #16
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Here is the working version of the code:
I didn't see starson17's post before I went a different route and used try/except statements which worked fine.

You might wanna remove a few more tags for junk but this should do it.

Spoiler:

Code:
#!/usr/bin/env  python
__license__     = 'GPL v3'
__author__      = 'Tony Stegall'
__copyright__   = '2010, Tony Stegall or Tonythebookworm on mobileread.com'
__version__     = 'v1.01'
__date__        = '07, October 2010'
__description__ = 'La weekly mag'

'''
http://www.laweekly.com
'''

from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ptempfile import PersistentTemporaryFile

class LaWeekly(BasicNewsRecipe):
    __author__    = 'Tony Stegall'
    description   = 'La Weekly Mag'
    cover_url     = 'http://assets.laweekly.com/img/citylogo-lg.png'
    

    title          = 'La WeeklyMag '
    publisher      = 'Laweekly.com'
    category       = 'News,US'

    language       = 'en'
    timefmt        = '[%a, %d %b, %Y]'

    oldest_article        = 15
    max_articles_per_feed = 25
    use_embedded_content  = False
    no_stylesheets = True

    remove_javascript     = True
    #####################################################################################
    # cleanup section                                                                   #
    #####################################################################################
    remove_tags        = [
                            dict(name='div', attrs={'class':['chisel_u r_box','sitenav','ListingsSearchWidgetHoriz','events_location_tabs location vcard']}),
                            dict(name='div', attrs={'id':['navBottom','comments','mac_tags']}),
                            dict(name='div', attrs={'class':['likemewidget chisel_u','events_more_events','chisel_u r_box city']}),
                            dict(name='div', attrs={'class':['bottom_bar','footer','binTitle']}),
                            dict(name='a', attrs={'class':'likeme_badge'})
                            
                        ]
    
    
    
    
    ######################################################################################################################
    '''
    We need to take and find all instances of /content/printVersion/
    So in order to do this we take and setup a temp list
    Then we turn on the flag to tell calibre/beautifulsoup that the articles are obfuscated
    Then we take and get the obfuscated article (in our case the print version)
    We take and create a browser and let calibre do all the work for us. It will open an internal browser and follow
    then links that match the regular expression of .*?(\\/)(content)(\\/)(printVersion)(\\/)
    so basically any link that looks like this /content/printVersion/
    it takes and writes all the information to a temp html file.  that the recipe/calibre will parse from.
    And thats all that is needed for this recipe.
    '''

    temp_files = []
    articles_are_obfuscated = True

    def get_obfuscated_article(self, url):
        br = self.get_browser()
        print 'THE CURRENT URL IS: ', url
        br.open(url)
        '''
         	we need to use a try catch block:
         	what this does is trys to do an operation and if it fails instead of crashing it simply catchs it and does
         	something with the error.
         	So in our case we take and check to see if we can follow /content/printVersion, then if we can't
         	then we simply pass it back the original calling url 
        '''
        
        try:
         response = br.follow_link(url_regex='.*?(\\/)(content)(\\/)(printVersion)(\\/)', nr = 0)
         html = response.read()
        except:
         response = br.open(url)
         html = response.read()
         
        self.temp_files.append(PersistentTemporaryFile('_fa.html'))
        self.temp_files[-1].write(html)
        self.temp_files[-1].close()
        return self.temp_files[-1].name

    ######################################################################################################################

    feeds          = [
                       (u'Complete Issue', u'http://www.laweekly.com/syndication/issue/'),
                       (u'News', u'http://www.laweekly.com/syndication/section/news/'),
                       (u'Music', u'http://www.laweekly.com/syndication/section/music/'),
                       (u'Movies', u'http://www.laweekly.com/syndication/section/film/'),
                       (u'Restaurants', u'http://www.laweekly.com/syndication/section/dining/'),
                       (u'Music Events', u'http://laweekly.com/syndication/events?type=music'),
                       (u'Calendar Events', u'http://laweekly.com/syndication/events'),
                       (u'Restaurant Guide', u'http://laweekly.com/syndication/restaurants/search/'),
                       
                     ]


P.S. thanks starson17 for the response I didn't see it before I finished this up.
TonytheBookworm is offline   Reply With Quote
Old 10-08-2010, 03:16 PM   #17
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by TonytheBookworm View Post
P.S. thanks starson17 for the response I didn't see it before I finished this up.
I knew you'd come up with something. If I might ask - was there some reason he couldn't just grab the article page in the usual way and use normal keep or remove tags statements to clean off the junk?
Starson17 is offline   Reply With Quote
Old 10-08-2010, 03:25 PM   #18
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Quote:
Originally Posted by Starson17 View Post
I knew you'd come up with something. If I might ask - was there some reason he couldn't just grab the article page in the usual way and use normal keep or remove tags statements to clean off the junk?
He could have but the reason I went this route was because of the fact you had multiple links on the page. Basically what I seen was next page next page next page. But i then seen on all the pages except the events that they offer a print version that put it all on one page. So for my sake I simple picked to follow only those links
TonytheBookworm is offline   Reply With Quote
Old 10-08-2010, 03:30 PM   #19
kidblue
Connoisseur
kidblue began at the beginning.
 
Posts: 79
Karma: 10
Join Date: Oct 2010
Device: Kindle 3
For some strange reason, I'm now getting this error after the feed ships off to my device via email:

Spoiler:
ERROR: ERROR: Unhandled exception: <b>TypeError</b>:'NoneType' object does not support item assignment

Traceback (most recent call last):
File "site-packages/calibre/gui2/dialogs/book_info.py", line 48, in slave
File "site-packages/calibre/gui2/dialogs/book_info.py", line 96, in refresh
File "site-packages/calibre/gui2/library/models.py", line 378, in get_book_info
TypeError: 'NoneType' object does not support item assignment


And this:


ERROR: ERROR: Unhandled exception: <b>AttributeError</b>:'NoneType' object has no attribute 'custom_recipe_collection'

Traceback (most recent call last):
File "site-packages/calibre/gui2/dialogs/user_profiles.py", line 41, in rowCount
AttributeError: 'NoneType' object has no attribute 'custom_recipe_collection'

Last edited by kidblue; 10-08-2010 at 03:32 PM.
kidblue is offline   Reply With Quote
Old 10-08-2010, 03:34 PM   #20
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by TonytheBookworm View Post
on all the pages except the events that they offer a print version that put it all on one page.
That explains it. Thanks.
Starson17 is offline   Reply With Quote
Old 10-08-2010, 04:38 PM   #21
TonytheBookworm
Addict
TonytheBookworm is on a distinguished road
 
TonytheBookworm's Avatar
 
Posts: 264
Karma: 62
Join Date: May 2010
Device: kindle 2, kindle 3, Kindle fire
Quote:
Originally Posted by kidblue View Post
For some strange reason, I'm now getting this error after the feed ships off to my device via email:

Spoiler:
ERROR: ERROR: Unhandled exception: <b>TypeError</b>:'NoneType' object does not support item assignment

Traceback (most recent call last):
File "site-packages/calibre/gui2/dialogs/book_info.py", line 48, in slave
File "site-packages/calibre/gui2/dialogs/book_info.py", line 96, in refresh
File "site-packages/calibre/gui2/library/models.py", line 378, in get_book_info
TypeError: 'NoneType' object does not support item assignment


And this:


ERROR: ERROR: Unhandled exception: <b>AttributeError</b>:'NoneType' object has no attribute 'custom_recipe_collection'

Traceback (most recent call last):
File "site-packages/calibre/gui2/dialogs/user_profiles.py", line 41, in rowCount
AttributeError: 'NoneType' object has no attribute 'custom_recipe_collection'
That one I'm not sure about, I ran it in calibre on my end and it works fine for me. Of course I am not using an email forward. Not sure.
TonytheBookworm is offline   Reply With Quote
Old 10-09-2010, 04:16 PM   #22
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,364
Karma: 27230406
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
@kidblue: restart calibre and the error will go away
kovidgoyal is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
K3 review from Publisher's Weekly carld Amazon Kindle 3 08-26-2010 02:19 PM
Full Articles via RSS jotheman Reading and Management 17 07-06-2008 05:12 AM
Weekly Discounts at eBooks About Everything -- 12/20/07 KatrinaCardway Deals and Resources (No Self-Promotion or Affiliate Links) 0 12-20-2007 01:18 PM
The Weekly Standard on Google Books BenG News 3 12-10-2007 10:16 AM


All times are GMT -4. The time now is 03:41 PM.


MobileRead.com is a privately owned, operated and funded community.