|
|
#1 |
|
Member
![]() Posts: 11
Karma: 10
Join Date: Feb 2011
Device: Kindle 3
|
Trying to strip the date from an article URL
Hi Everyone,
I'm trying to create a recipe for a local newspaper. The article URL is formated like this: "http://www.mahopacnews.com/Articles-c-2011-02-15-207354.112113-Former-official-to-receive-750-daily-for-interim-position-plus-pension-.html" The print version url is formated like this: "http://www.mahopacnews.com/LPprintwindow.LASSO?-token.editorialcall=207354.112113" The problem I have is that this section of the article URL "-c-2011-02-15-" contains a date which changes so using url.replace does not seem to work. Is there a work around for this? I saw a couple of examples using spilt.url however I am new to this and I can't seem to get it to work. I would also like to strip out all the characters after the article number ie: "-Former-official-to-receive-750-daily-for-interim-position-plus-pension-.html" I would appreciate any help that you folks could give me. Thanks John |
|
|
|
|
|
#2 |
|
Member
![]() Posts: 11
Karma: 10
Join Date: Feb 2011
Device: Kindle 3
|
Figured it out
Code:
from calibre.web.feeds.recipes import BasicNewsRecipe
class AdvancedUserRecipe1297969350(BasicNewsRecipe):
title = u'Mahopac News'
description = 'Mahopac News Features'
oldest_article = 2
max_articles_per_feed = 100
feeds = [(u' ', u'http://www.mahopacnews.com/rssheadlines.xml')]
def print_version(self,url):
baseURL='http://www.mahopacnews.com/LPprintwindow.LASSO?-token.editorialcall='
segments = url.split('-')
printURL = baseURL + segments[5]
return printURL
Last edited by Finbar127; 03-02-2011 at 10:21 PM. |
|
|
|
![]() |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Problem with Article Date in parse_index | spedinfargo | Recipes | 5 | 02-19-2011 07:12 PM |
| Date Added vs. Date Modified | aglaia761 | Calibre | 5 | 11-28-2010 05:34 AM |
| Bulk Changing Published Date To Date | hmf | Calibre | 4 | 10-19-2010 10:19 PM |
| Up-to-date candy teacher (date being 1921) | kacir | Deals and Resources (No Self-Promotion or Affiliate Links) | 0 | 06-16-2010 04:18 PM |
| new official shipping date / US invitation date | R2D2 | iRex | 18 | 07-06-2006 02:32 PM |