02-22-2018, 09:41 PM | #1 |
Junior Member
Posts: 7
Karma: 10
Join Date: Feb 2018
Device: Kindle Paperwhite
|
Calibre News Source 'The Hindu' Recipe needs correction
The Hindu - OpEd - Yes/No/It'sComplicated format articles are not being fetched completely, only 1/3 rd parts are being downloaded.
Ex: http://www.thehindu.com/opinion/op-e...le22828585.ece These type of articles have 3 parts - Yes, No, It's Complicated. I am suspecting calibre is discounting duplicates in title and hence after downloading 'Yes' Part, disregard other 2 parts. Is it possible to update the recipe? or at least let me know what code I can put in the recipe. I can edit the recipe. That is the extent of my IT capabilities. Thanks |
02-23-2018, 03:00 AM | #2 |
creator of calibre
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Since that is a single aritcle, not three, I doubt that duplicate removal is the problem, but it is easy to test,simple remove the
ignore_duplicate_articles = {'title', 'url'} from the recipe. |
Advert | |
|
02-24-2018, 12:05 AM | #3 |
Junior Member
Posts: 7
Karma: 10
Join Date: Feb 2018
Device: Kindle Paperwhite
|
Hi, kovid
Thanks for response. I have updated the code. Sadly it is weekly article so Paper of yesterday did not contain 'Yes, No ...' And I do not how to download specific date paper or 'x days' back paper. Is there any easy code, I can add in recipe for this ? If nothing is there, I will update on 3rd March |
03-02-2018, 05:40 AM | #4 |
Big Poppa
Posts: 110
Karma: 10
Join Date: Jul 2010
Device: Nook
|
to test edit the recipe where the line has index_to_soup and a url with 'todays-paper' (i think 63) change that URL to the date you want
like http://www.thehindu.com/archive/print/2018/03/01/ |
03-05-2018, 11:11 PM | #5 | |
Junior Member
Posts: 7
Karma: 10
Join Date: Feb 2018
Device: Kindle Paperwhite
|
Quote:
Code:
# Modified on 2018Mar06 To download specific date paper soup = self.index_to_soup('http://www.thehindu.com/archive/print/2018/03/02/') #soup = self.index_to_soup('http://www.thehindu.com/todays-paper/') nav_div = soup.find(id='subnav-tpbar-latest') section_list = [] I have compared code of pages on 'todays-paper' and 'print/2018/03/02/' & it looks like the id = subnav-tpbar-latest contains complete URL in first case but only links in 2nd case which is causing Calibre to not able to find any article I have attached Calibre logs with relevant screenshots. Requesting your further help in clarification if possible . Thanks |
|
Advert | |
|
03-08-2018, 09:39 PM | #6 | |
Junior Member
Posts: 7
Karma: 10
Join Date: Feb 2018
Device: Kindle Paperwhite
|
Quote:
Commenting out the code did not work. Only Yes part is being downloaded. Any other suggestion ? |
|
03-09-2018, 12:37 AM | #7 |
creator of calibre
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
I just tried it and it works for me, see screenshot
|
03-09-2018, 03:00 AM | #8 |
Junior Member
Posts: 7
Karma: 10
Join Date: Feb 2018
Device: Kindle Paperwhite
|
Howwww !!!
Just now, I have removed the line 25 from the original recipe all together & kept everything same as original. Still I am not getting the other two parts... Have you modified the original recipe or it was a trial code only ? |
03-09-2018, 03:01 AM | #9 |
creator of calibre
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Nope, no modifications.
|
08-10-2018, 02:18 AM | #10 |
Junior Member
Posts: 2
Karma: 10
Join Date: Aug 2018
Device: Kindle 8th Gen
|
Hindu Paper
Same issue, only 1 of the 3 part getting downloaded and removing 25th line in code is not helping.
https://www.thehindu.com/opinion/op-...le24646941.ece Only 'YES' part gets downloaded leaving 'NO', and 'it's complicated'. |
08-11-2018, 01:44 AM | #11 |
Junior Member
Posts: 2
Karma: 10
Join Date: Aug 2018
Device: Kindle 8th Gen
|
Hindu Paper recipe modification.
Hello Kovid,
I'm really grateful to you for making such wonderful software which makes reading awesome, and turning e-reading devices cost effective. Issue:- https://www.thehindu.com/opinion/op-...le24646941.ece. Only yes part is being downloaded, leaving No and it's complicated. Such kind of Article comes every week on Friday, in "The Hindu" Newspaper. I'm not from IT background, but I tried removing the the code, ignore_duplicate_articles = {'title', 'url'} It isn't working for me as well, I guess I'm missing something out there. I have read your articles that you work 80 hours a week and you must be very busy, but if you can take out time to upload the recipe here with modifications, That would be helpful, or tweak the inbuilt recipe. |
08-11-2018, 08:14 PM | #12 |
creator of calibre
Posts: 43,860
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
I'm afraid I cannot fix the recipe for an issue I cannot reproduce, sorry.
|
10-05-2018, 11:11 PM | #13 |
Junior Member
Posts: 1
Karma: 10
Join Date: Oct 2018
Device: Kindle 8th gen
|
What to change in the recipe of " The Hindu" to remove image ?
|
Tags |
calibre 3.17 |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
The Hindu Recipe(Better than the default provided with calibre) | sexymax15 | Recipes | 7 | 04-26-2017 11:43 AM |
"The Hindu" Fetch News Builtin Recipe Not Working | gids | Recipes | 1 | 01-13-2017 07:09 AM |
How to manually set the language for each source of the news recipe? | mendesitba | Recipes | 6 | 04-02-2015 06:42 PM |
NME recipe URL correction | scissors | Recipes | 0 | 05-17-2013 10:41 AM |
the hindu recipe | Dr. Ankala Mulle | Recipes | 0 | 04-24-2013 03:29 PM |