|
|
#1 |
|
Connoisseur
![]() Posts: 51
Karma: 10
Join Date: Oct 2018
Device: kindle
|
Hi, guys
I've written a recipe (inherited from BasicNewsRecipe) to fetch some articles online, but when I converted my recipe to ebooks, I only got titles and links and no contents at all. After searching for a while, it seems that I should define user_agent in "get_browser". This has partly solved the problem. But still, some articles are still empty. Any ideas? Thank you!
|
|
|
|
|
|
#2 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,598
Karma: 28548962
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
are you using auto_cleanup? If so try turning it off.
|
|
|
|
| Advert | |
|
|
|
|
#3 |
|
Connoisseur
![]() Posts: 51
Karma: 10
Join Date: Oct 2018
Device: kindle
|
No, I didn't use auto_cleanup. Here is my testing custome recipe in the attachment.
You'll be asked to input an article link. Please use this article link: http://www.theworldin.com/edition/20...endulum-swings. And you may get an empty article. But it seems links from other sites can do. |
|
|
|
|
|
#4 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,598
Karma: 28548962
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
The server you are contacting is failing, probably ecause it needs some cookies set or something similar. Add this to your recipe to check:
Code:
def preprocess_raw_html(self, html, url):
with open('/t/raw.html', 'wb') as f:
f.write(html.encode('utf-8'))
return html
|
|
|
|
|
|
#5 | |
|
Connoisseur
![]() Posts: 51
Karma: 10
Join Date: Oct 2018
Device: kindle
|
I got this raw html:
Quote:
|
|
|
|
|
| Advert | |
|
|
|
|
#6 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,598
Karma: 28548962
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
only the person running the server can tell you that.
|
|
|
|
|
|
#7 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,598
Karma: 28548962
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
most liekly it is using javascript to load content
|
|
|
|
|
|
#8 |
|
Connoisseur
![]() Posts: 51
Karma: 10
Join Date: Oct 2018
Device: kindle
|
If so, is there no way to work around this?
|
|
|
|
|
|
#9 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,598
Karma: 28548962
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
no easy way. you would basically need to figure out what requests the javascript is making to load the actual content and make those requests manually in the recipe.
|
|
|
|
|
|
#10 |
|
Connoisseur
![]() Posts: 51
Karma: 10
Join Date: Oct 2018
Device: kindle
|
Can calibre support Selenium to fetch web pages so that I can work around js?
|
|
|
|
|
|
#11 |
|
Connoisseur
![]() Posts: 51
Karma: 10
Join Date: Oct 2018
Device: kindle
|
I think it would be great if get_browser supports selenium. Is this possible?
|
|
|
|
|
|
#12 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,598
Karma: 28548962
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
no, I'm afraid not.
|
|
|
|
![]() |
| Tags |
| empty page, recipe |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| All pages empty after converting epub in Calibre | Apostrophe | Conversion | 1 | 01-29-2015 10:08 AM |
| Previously downloaded articles & empty editions | paipa | Recipes | 2 | 11-03-2013 01:20 PM |
| Financial Times recipe downloading slowly, empty pages | mapex | Recipes | 34 | 06-06-2013 06:27 AM |
| InDesign to Epub (empty pages) | PauloCoe | EPUBReader | 1 | 06-22-2011 08:56 AM |
| Reversing articles order in a custom news recipe? | retired_anon_25 | Calibre | 5 | 12-12-2009 05:24 PM |