![]() |
#1 |
Member
![]() Posts: 11
Karma: 10
Join Date: Feb 2011
Device: Kindle 3
|
Trying to strip the date from an article URL
Hi Everyone,
I'm trying to create a recipe for a local newspaper. The article URL is formated like this: "http://www.mahopacnews.com/Articles-c-2011-02-15-207354.112113-Former-official-to-receive-750-daily-for-interim-position-plus-pension-.html" The print version url is formated like this: "http://www.mahopacnews.com/LPprintwindow.LASSO?-token.editorialcall=207354.112113" The problem I have is that this section of the article URL "-c-2011-02-15-" contains a date which changes so using url.replace does not seem to work. Is there a work around for this? I saw a couple of examples using spilt.url however I am new to this and I can't seem to get it to work. I would also like to strip out all the characters after the article number ie: "-Former-official-to-receive-750-daily-for-interim-position-plus-pension-.html" I would appreciate any help that you folks could give me. Thanks John |
![]() |
![]() |
![]() |
#2 |
Member
![]() Posts: 11
Karma: 10
Join Date: Feb 2011
Device: Kindle 3
|
Figured it out
Code:
from calibre.web.feeds.recipes import BasicNewsRecipe class AdvancedUserRecipe1297969350(BasicNewsRecipe): title = u'Mahopac News' description = 'Mahopac News Features' oldest_article = 2 max_articles_per_feed = 100 feeds = [(u' ', u'http://www.mahopacnews.com/rssheadlines.xml')] def print_version(self,url): baseURL='http://www.mahopacnews.com/LPprintwindow.LASSO?-token.editorialcall=' segments = url.split('-') printURL = baseURL + segments[5] return printURL Last edited by Finbar127; 03-02-2011 at 10:21 PM. |
![]() |
![]() |
Advert | |
|
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Problem with Article Date in parse_index | spedinfargo | Recipes | 5 | 02-19-2011 07:12 PM |
Date Added vs. Date Modified | aglaia761 | Calibre | 5 | 11-28-2010 05:34 AM |
Bulk Changing Published Date To Date | hmf | Calibre | 4 | 10-19-2010 10:19 PM |
Up-to-date candy teacher (date being 1921) | kacir | Deals and Resources (No Self-Promotion or Affiliate Links) | 0 | 06-16-2010 04:18 PM |
new official shipping date / US invitation date | R2D2 | iRex | 18 | 07-06-2006 02:32 PM |