|
|
#1 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 645
Karma: 85520
Join Date: May 2021
Device: kindle
|
Live Law
https://www.livelaw.in
date is available for every article. Is there a way to eliminate older links from feeds based on it? date shows up something like this: '14 Jun 2022 5:15 AM GMT' or '12 March 2022 4:12 PM GMT' |
|
|
|
|
|
#2 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,610
Karma: 28549044
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
If you want to skip an article you can call self.abort_article() inside any of the preprocess methods to abort fetching that article. Although if you are using RSS feeds you can just use oldest_article to avoid fetching the articles at all.
|
|
|
|
| Advert | |
|
|
|
|
#3 | |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 645
Karma: 85520
Join Date: May 2021
Device: kindle
|
I want to skip links even before it tries to fetch article from that link.
I'm able to get date from feeds page itself.. Quote:
|
|
|
|
|
|
|
#4 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,610
Karma: 28549044
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
parse the date and check it then. You can use
from calibre.utils.date import parse_date or if automated parsing doesnt work you can easily parse it yourself Code:
parts = date.split()
day = int(parts[0])
year = int(parts[2])
month = {'Jan': 1, 'January': 1, 'Feb': 2, ...}[parts[2]]
|
|
|
|
|
|
#5 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 645
Karma: 85520
Join Date: May 2021
Device: kindle
|
this is the date format from website :16 Jun 2022 5:15 AM GMT
Code:
from calibre.utils.date import parse_date
date = parse_date(self.tag_to_string(d).replace(' AM GMT', ':00 +0530').replace(' PM GMT', ':00 +0530'))
ans.append({
'title': title,
'url': url,
'date': date})
return ans
2022-06-16 04:55:00+00:00 2021-07-04 02:39:00+00:00 2020-05-16 01:30:00+00:00 but it wont skip links.. oldest_article = 2 #days Last edited by unkn0wn; 06-16-2022 at 09:59 AM. |
|
|
|
| Advert | |
|
|
|
|
#6 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,610
Karma: 28549044
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
oldest_article only applies to rss feeds, for parse_index you do it manually. Simply check the date yourself and skip the articles you dont want.
|
|
|
|
|
|
#7 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 645
Karma: 85520
Join Date: May 2021
Device: kindle
|
thanks
|
|
|
|
|
|
#8 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 645
Karma: 85520
Join Date: May 2021
Device: kindle
|
tried to load this today.. it stopped working.
small changes though https://github.com/kovidgoyal/calibr...ive_law.recipe add 'javascript:void(0);', to omit_list (line 94) and classes('in-image-ad-wrap'), to remove_tags Last edited by unkn0wn; 08-31-2022 at 03:42 AM. |
|
|
|
|
|
#9 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 645
Karma: 85520
Join Date: May 2021
Device: kindle
|
Also small change to Live Mint
https://github.com/kovidgoyal/calibr...ivemint.recipe change line 100 to Code:
body = data['articleBody'] + '</p> <p>'\
+ re.sub(r'(([a-z]|[^A-Z])\.|\.”)([A-Z]|“[A-Z])', r'\1 <p> \3', value)
|
|
|
|
|
|
#10 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 645
Karma: 85520
Join Date: May 2021
Device: kindle
|
attached recipes for the above after changes
recipes attached
|
|
|
|
![]() |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Request: Live Law | abhix3 | Recipes | 0 | 08-10-2021 01:08 PM |