![]() |
#1 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 616
Karma: 85520
Join Date: May 2021
Device: kindle
|
Live Law
https://www.livelaw.in
date is available for every article. Is there a way to eliminate older links from feeds based on it? date shows up something like this: '14 Jun 2022 5:15 AM GMT' or '12 March 2022 4:12 PM GMT' |
![]() |
![]() |
![]() |
#2 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,355
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
If you want to skip an article you can call self.abort_article() inside any of the preprocess methods to abort fetching that article. Although if you are using RSS feeds you can just use oldest_article to avoid fetching the articles at all.
|
![]() |
![]() |
Advert | |
|
![]() |
#3 | |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 616
Karma: 85520
Join Date: May 2021
Device: kindle
|
I want to skip links even before it tries to fetch article from that link.
I'm able to get date from feeds page itself.. Quote:
|
|
![]() |
![]() |
![]() |
#4 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,355
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
parse the date and check it then. You can use
from calibre.utils.date import parse_date or if automated parsing doesnt work you can easily parse it yourself Code:
parts = date.split() day = int(parts[0]) year = int(parts[2]) month = {'Jan': 1, 'January': 1, 'Feb': 2, ...}[parts[2]] |
![]() |
![]() |
![]() |
#5 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 616
Karma: 85520
Join Date: May 2021
Device: kindle
|
this is the date format from website :16 Jun 2022 5:15 AM GMT
Code:
from calibre.utils.date import parse_date date = parse_date(self.tag_to_string(d).replace(' AM GMT', ':00 +0530').replace(' PM GMT', ':00 +0530')) ans.append({ 'title': title, 'url': url, 'date': date}) return ans 2022-06-16 04:55:00+00:00 2021-07-04 02:39:00+00:00 2020-05-16 01:30:00+00:00 but it wont skip links.. oldest_article = 2 #days Last edited by unkn0wn; 06-16-2022 at 08:59 AM. |
![]() |
![]() |
Advert | |
|
![]() |
#6 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,355
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
oldest_article only applies to rss feeds, for parse_index you do it manually. Simply check the date yourself and skip the articles you dont want.
|
![]() |
![]() |
![]() |
#7 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 616
Karma: 85520
Join Date: May 2021
Device: kindle
|
thanks
|
![]() |
![]() |
![]() |
#8 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 616
Karma: 85520
Join Date: May 2021
Device: kindle
|
tried to load this today.. it stopped working.
small changes though https://github.com/kovidgoyal/calibr...ive_law.recipe add 'javascript:void(0);', to omit_list (line 94) and classes('in-image-ad-wrap'), to remove_tags Last edited by unkn0wn; 08-31-2022 at 02:42 AM. |
![]() |
![]() |
![]() |
#9 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 616
Karma: 85520
Join Date: May 2021
Device: kindle
|
Also small change to Live Mint
https://github.com/kovidgoyal/calibr...ivemint.recipe change line 100 to Code:
body = data['articleBody'] + '</p> <p>'\ + re.sub(r'(([a-z]|[^A-Z])\.|\.”)([A-Z]|“[A-Z])', r'\1 <p> \3', value) |
![]() |
![]() |
![]() |
#10 |
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 616
Karma: 85520
Join Date: May 2021
Device: kindle
|
attached recipes for the above after changes
recipes attached
|
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Request: Live Law | abhix3 | Recipes | 0 | 08-10-2021 12:08 PM |