![]() |
#1 |
Connoisseur
![]() Posts: 51
Karma: 10
Join Date: Oct 2018
Device: kindle
|
![]()
Hi, guys
I've written a recipe (inherited from BasicNewsRecipe) to fetch some articles online, but when I converted my recipe to ebooks, I only got titles and links and no contents at all. After searching for a while, it seems that I should define user_agent in "get_browser". This has partly solved the problem. But still, some articles are still empty. Any ideas? Thank you! ![]() |
![]() |
![]() |
![]() |
#2 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,345
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
are you using auto_cleanup? If so try turning it off.
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Connoisseur
![]() Posts: 51
Karma: 10
Join Date: Oct 2018
Device: kindle
|
No, I didn't use auto_cleanup. Here is my testing custome recipe in the attachment.
You'll be asked to input an article link. Please use this article link: http://www.theworldin.com/edition/20...endulum-swings. And you may get an empty article. But it seems links from other sites can do. |
![]() |
![]() |
![]() |
#4 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,345
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
The server you are contacting is failing, probably ecause it needs some cookies set or something similar. Add this to your recipe to check:
Code:
def preprocess_raw_html(self, html, url): with open('/t/raw.html', 'wb') as f: f.write(html.encode('utf-8')) return html |
![]() |
![]() |
![]() |
#5 | |
Connoisseur
![]() Posts: 51
Karma: 10
Join Date: Oct 2018
Device: kindle
|
![]()
I got this raw html:
Quote:
|
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,345
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
only the person running the server can tell you that.
|
![]() |
![]() |
![]() |
#7 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,345
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
most liekly it is using javascript to load content
|
![]() |
![]() |
![]() |
#8 |
Connoisseur
![]() Posts: 51
Karma: 10
Join Date: Oct 2018
Device: kindle
|
If so, is there no way to work around this?
|
![]() |
![]() |
![]() |
#9 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,345
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
no easy way. you would basically need to figure out what requests the javascript is making to load the actual content and make those requests manually in the recipe.
|
![]() |
![]() |
![]() |
#10 |
Connoisseur
![]() Posts: 51
Karma: 10
Join Date: Oct 2018
Device: kindle
|
Can calibre support Selenium to fetch web pages so that I can work around js?
|
![]() |
![]() |
![]() |
#11 |
Connoisseur
![]() Posts: 51
Karma: 10
Join Date: Oct 2018
Device: kindle
|
I think it would be great if get_browser supports selenium. Is this possible?
|
![]() |
![]() |
![]() |
#12 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,345
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
no, I'm afraid not.
|
![]() |
![]() |
![]() |
Tags |
empty page, recipe |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
All pages empty after converting epub in Calibre | Apostrophe | Conversion | 1 | 01-29-2015 10:08 AM |
Previously downloaded articles & empty editions | paipa | Recipes | 2 | 11-03-2013 01:20 PM |
Financial Times recipe downloading slowly, empty pages | mapex | Recipes | 34 | 06-06-2013 06:27 AM |
InDesign to Epub (empty pages) | PauloCoe | EPUBReader | 1 | 06-22-2011 08:56 AM |
Reversing articles order in a custom news recipe? | retired_anon_25 | Calibre | 5 | 12-12-2009 05:24 PM |