|
|
#1 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 645
Karma: 85520
Join Date: May 2021
Device: kindle
|
The Hindu recipe is not recognizing articles from a section
Code:
Found section: Sci-tech & Agri https://www.thehindu.com/todays-paper/tp-features/tp-sci-tech-and-agri/
Found section: Others https://www.thehindu.com/todays-paper/tp-miscellaneous/tp-others/
Found article: Celebrations break out as farmers from Punjab, Haryana reach home
https://www.thehindu.com/todays-paper/tp-miscellaneous/tp-others/celebrations-break-out-as-farmers-from-punjab-haryana-reach-home/article37936952.ece
Found article: Trinamool promises Rs. 5,000 per month for women in Goa
.
.
.
all other sections and articles load perfectly. https://www.thehindu.com/archive/print/2021/12/12/ https://github.com/kovidgoyal/calibr...s/hindu.recipe EDIT Okay, looks like the link doesn't open and show article links, for calibre to fetch.. https://www.thehindu.com/todays-pape...tech-and-agri/ but those articles links are present in today's paper list link Last edited by unkn0wn; 12-12-2021 at 09:21 AM. Reason: Maybe i found the reason |
|
|
|
|
|
#2 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,609
Karma: 28549044
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Thats because that page is empty on the website. Go to the scitech and agri page and see for yourself.
|
|
|
|
| Advert | |
|
|
|
|
#3 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 645
Karma: 85520
Join Date: May 2021
Device: kindle
|
mechanize._response.httperror_seek_wrapper: HTTP Error 403: Forbidden
mechanize._response.httperror_seek_wrapper: HTTP Error 403: Forbidden
today i got this error.. website is working fine.. I didn't want to start a new thread! Spoiler:
|
|
|
|
|
|
#4 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,609
Karma: 28549044
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
That will be because the website is using some kind of bot detection. Someone with more time than I do will need to figure out what is needed to bypass the bot detection.
Code:
calibre-debug -c 'from calibre import browser; br = browser(); br.open("https://www.thehindu.com/todays-paper/")'
Traceback (most recent call last):
File "/usr/bin/calibre-debug", line 21, in <module>
sys.exit(main())
File "/home/kovid/work/calibre/src/calibre/debug.py", line 272, in main
exec(opts.command)
File "<string>", line 1, in <module>
File "/usr/lib/python3.10/site-packages/mechanize/_mechanize.py", line 257, in open
return self._mech_open(url_or_request, data, timeout=timeout)
File "/usr/lib/python3.10/site-packages/mechanize/_mechanize.py", line 313, in _mech_open
raise response
mechanize._response.get_seek_wrapper_class.<locals>.httperror_seek_wrapper: HTTP Error 403: Forbidden
|
|
|
|
|
|
#5 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,609
Karma: 28549044
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Actually this should take care of it: https://github.com/kovidgoyal/calibr...b687d54a36e88c
|
|
|
|
| Advert | |
|
|
|
|
#6 |
|
Guru
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 645
Karma: 85520
Join Date: May 2021
Device: kindle
|
Thank you. You are a magician..
|
|
|
|
![]() |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| The Hindu Recipe omit pictures | vishnumvv | Recipes | 11 | 09-07-2020 01:43 AM |
| Request for Recipe - PIB and The Hindu Archives | Anubhav | Recipes | 0 | 07-28-2017 10:10 AM |
| Req for Adding H4 section to the hindu receipe | vishnumvv | Recipes | 1 | 06-24-2017 10:53 PM |
| The Hindu Business line recipe | dhiru | Recipes | 4 | 06-05-2013 10:47 PM |
| the hindu recipe | Dr. Ankala Mulle | Recipes | 0 | 04-24-2013 04:29 PM |