10-19-2018, 10:21 PM | #1 |
Connoisseur
Posts: 50
Karma: 10
Join Date: Oct 2018
Device: kindle
|
Blank pages (empty articles) in custom recipe
Hi, guys
I've written a recipe (inherited from BasicNewsRecipe) to fetch some articles online, but when I converted my recipe to ebooks, I only got titles and links and no contents at all. After searching for a while, it seems that I should define user_agent in "get_browser". This has partly solved the problem. But still, some articles are still empty. Any ideas? Thank you! |
10-20-2018, 12:06 AM | #2 |
creator of calibre
Posts: 43,851
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
are you using auto_cleanup? If so try turning it off.
|
Advert | |
|
10-20-2018, 10:41 AM | #3 |
Connoisseur
Posts: 50
Karma: 10
Join Date: Oct 2018
Device: kindle
|
No, I didn't use auto_cleanup. Here is my testing custome recipe in the attachment.
You'll be asked to input an article link. Please use this article link: http://www.theworldin.com/edition/20...endulum-swings. And you may get an empty article. But it seems links from other sites can do. |
10-20-2018, 09:36 PM | #4 |
creator of calibre
Posts: 43,851
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
The server you are contacting is failing, probably ecause it needs some cookies set or something similar. Add this to your recipe to check:
Code:
def preprocess_raw_html(self, html, url): with open('/t/raw.html', 'wb') as f: f.write(html.encode('utf-8')) return html |
10-21-2018, 01:30 AM | #5 | |
Connoisseur
Posts: 50
Karma: 10
Join Date: Oct 2018
Device: kindle
|
I got this raw html:
Quote:
|
|
Advert | |
|
10-21-2018, 01:32 AM | #6 |
creator of calibre
Posts: 43,851
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
only the person running the server can tell you that.
|
10-21-2018, 01:33 AM | #7 |
creator of calibre
Posts: 43,851
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
most liekly it is using javascript to load content
|
10-21-2018, 01:37 AM | #8 |
Connoisseur
Posts: 50
Karma: 10
Join Date: Oct 2018
Device: kindle
|
If so, is there no way to work around this?
|
10-21-2018, 01:37 AM | #9 |
creator of calibre
Posts: 43,851
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
no easy way. you would basically need to figure out what requests the javascript is making to load the actual content and make those requests manually in the recipe.
|
10-21-2018, 01:48 AM | #10 |
Connoisseur
Posts: 50
Karma: 10
Join Date: Oct 2018
Device: kindle
|
Can calibre support Selenium to fetch web pages so that I can work around js?
|
10-21-2018, 02:22 AM | #11 |
Connoisseur
Posts: 50
Karma: 10
Join Date: Oct 2018
Device: kindle
|
I think it would be great if get_browser supports selenium. Is this possible?
|
10-21-2018, 02:22 AM | #12 |
creator of calibre
Posts: 43,851
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
no, I'm afraid not.
|
Tags |
empty page, recipe |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
All pages empty after converting epub in Calibre | Apostrophe | Conversion | 1 | 01-29-2015 10:08 AM |
Previously downloaded articles & empty editions | paipa | Recipes | 2 | 11-03-2013 01:20 PM |
Financial Times recipe downloading slowly, empty pages | mapex | Recipes | 34 | 06-06-2013 06:27 AM |
InDesign to Epub (empty pages) | PauloCoe | EPUBReader | 1 | 06-22-2011 08:56 AM |
Reversing articles order in a custom news recipe? | mairabc | Calibre | 5 | 12-12-2009 05:24 PM |