|
|||||||
![]() |
|
|
Thread Tools | Search this Thread |
|
|
#1 |
|
Enthusiast
![]() Posts: 45
Karma: 10
Join Date: Dec 2010
Device: Kindle 3 Wifi only
|
Private recipe repeatedly fails with BeautifulSoup find (calibre 3.6)
Hi All,
at first thanks for your help and for Kovid for creating Calibre. I had created a private recipe for a relatively small Hungarian newspaper/site (behind a paywall). My recipe was working perfectly two weeks ago, but now I got back from vacation and just updated Calibre and now it fails with: Code:
Traceback (most recent call last): File "site-packages/calibre/web/fetch/simple.py", line 553, in process_links File "site-packages/calibre/web/feeds/news.py", line 992, in _postprocess_html File "<string>", line 53, in postprocess_html AttributeError: 'NoneType' object has no attribute 'replace' Code:
def postprocess_html(self, soup, first):
html_title = soup.find('title').string
new_html_title = html_title.replace(u" | ÉLET ÉS IRODALOM","")
Can you help me how to fix it? Thanks in advance, Zsolt |
|
|
|
|
|
#2 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,656
Karma: 28549046
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Check the soup again, as of calibre 3.5 HTML is now parsed using the HTML 5 parsing algorithm, not beautifulsoup's parser. So probably the different parsing algorithm is yielding different results.
|
|
|
|
| Advert | |
|
|
|
|
#3 |
|
Enthusiast
![]() Posts: 45
Karma: 10
Join Date: Dec 2010
Device: Kindle 3 Wifi only
|
Thanks for the quick reply Kovid!
Just modified my recipe to have: Code:
def postprocess_html(self, soup, first):
html_title = soup.find('title').string
if not html_title:
self.log("BINGO: ", html_title)
self.log(soup)
Code:
BINGO: None
<html><head><title>Csakis a közjó | ÉLET ÉS IRODALOM</title><style type="text/css" title="override_css">
.article_date {
color: gray; font-family: monospace;
}
### SNIPPED...
Traceback (most recent call last):
File "site-packages/calibre/web/fetch/simple.py", line 553, in process_links
File "site-packages/calibre/web/feeds/news.py", line 992, in _postprocess_html
File "<string>", line 51, in postprocess_html
AttributeError: 'NoneType' object has no attribute 'replace'
|
|
|
|
|
|
#4 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,656
Karma: 28549046
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
So what is line 51 in your recipe?
|
|
|
|
|
|
#5 |
|
Enthusiast
![]() Posts: 45
Karma: 10
Join Date: Dec 2010
Device: Kindle 3 Wifi only
|
It is: new_html_title = html_title.replace(u" | ÉLET ÉS IRODALOM","").
In my original post the html_title is soup.find('title'). And as in the debug, <title> is there. |
|
|
|
| Advert | |
|
|
|
|
#6 |
|
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,656
Karma: 28549046
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Use
self.tag_to_string(soup.find('title')) |
|
|
|
|
|
#7 |
|
Enthusiast
![]() Posts: 45
Karma: 10
Join Date: Dec 2010
Device: Kindle 3 Wifi only
|
That's "just works" thanks for your support Kovid!
|
|
|
|
![]() |
| Tags |
| recipe |
|
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| NY Times fails repeatedly | NSILMike | Recipes | 5 | 03-02-2017 03:46 PM |
| Linking fails on fedora17: cannot find boost::system::system_category | Dyspeptica | Sigil | 4 | 10-14-2012 11:27 PM |
| Recipe works when mocked up as Python file, fails when converted to Recipe | ode | Recipes | 7 | 09-04-2011 05:57 AM |
| Calibre epub from recipe fails in Sigil and FBReader on Android | siebert | Calibre | 15 | 12-04-2010 12:18 PM |
| NY Times Recipe in Calibre 6.36 Fails | keyrunner | Calibre | 1 | 01-28-2010 12:56 PM |