|
|
Thread Tools | Search this Thread |
08-04-2017, 10:49 AM | #1 |
Enthusiast
Posts: 45
Karma: 10
Join Date: Dec 2010
Device: Kindle 3 Wifi only
|
Private recipe repeatedly fails with BeautifulSoup find (calibre 3.6)
Hi All,
at first thanks for your help and for Kovid for creating Calibre. I had created a private recipe for a relatively small Hungarian newspaper/site (behind a paywall). My recipe was working perfectly two weeks ago, but now I got back from vacation and just updated Calibre and now it fails with: Code:
Traceback (most recent call last): File "site-packages/calibre/web/fetch/simple.py", line 553, in process_links File "site-packages/calibre/web/feeds/news.py", line 992, in _postprocess_html File "<string>", line 53, in postprocess_html AttributeError: 'NoneType' object has no attribute 'replace' Code:
def postprocess_html(self, soup, first): html_title = soup.find('title').string new_html_title = html_title.replace(u" | ÉLET ÉS IRODALOM","") Can you help me how to fix it? Thanks in advance, Zsolt |
08-04-2017, 10:54 AM | #2 |
creator of calibre
Posts: 43,850
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Check the soup again, as of calibre 3.5 HTML is now parsed using the HTML 5 parsing algorithm, not beautifulsoup's parser. So probably the different parsing algorithm is yielding different results.
|
08-04-2017, 11:00 AM | #3 |
Enthusiast
Posts: 45
Karma: 10
Join Date: Dec 2010
Device: Kindle 3 Wifi only
|
Thanks for the quick reply Kovid!
Just modified my recipe to have: Code:
def postprocess_html(self, soup, first): html_title = soup.find('title').string if not html_title: self.log("BINGO: ", html_title) self.log(soup) Code:
BINGO: None <html><head><title>Csakis a közjó | ÉLET ÉS IRODALOM</title><style type="text/css" title="override_css"> .article_date { color: gray; font-family: monospace; } ### SNIPPED... Traceback (most recent call last): File "site-packages/calibre/web/fetch/simple.py", line 553, in process_links File "site-packages/calibre/web/feeds/news.py", line 992, in _postprocess_html File "<string>", line 51, in postprocess_html AttributeError: 'NoneType' object has no attribute 'replace' |
08-04-2017, 12:05 PM | #4 |
creator of calibre
Posts: 43,850
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
So what is line 51 in your recipe?
|
08-04-2017, 12:37 PM | #5 |
Enthusiast
Posts: 45
Karma: 10
Join Date: Dec 2010
Device: Kindle 3 Wifi only
|
It is: new_html_title = html_title.replace(u" | ÉLET ÉS IRODALOM","").
In my original post the html_title is soup.find('title'). And as in the debug, <title> is there. |
08-04-2017, 12:47 PM | #6 |
creator of calibre
Posts: 43,850
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Use
self.tag_to_string(soup.find('title')) |
08-06-2017, 07:56 AM | #7 |
Enthusiast
Posts: 45
Karma: 10
Join Date: Dec 2010
Device: Kindle 3 Wifi only
|
That's "just works" thanks for your support Kovid!
|
Tags |
recipe |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
NY Times fails repeatedly | NSILMike | Recipes | 5 | 03-02-2017 02:46 PM |
Linking fails on fedora17: cannot find boost::system::system_category | Dyspeptica | Sigil | 4 | 10-14-2012 10:27 PM |
Recipe works when mocked up as Python file, fails when converted to Recipe | ode | Recipes | 7 | 09-04-2011 04:57 AM |
Calibre epub from recipe fails in Sigil and FBReader on Android | siebert | Calibre | 15 | 12-04-2010 11:18 AM |
NY Times Recipe in Calibre 6.36 Fails | keyrunner | Calibre | 1 | 01-28-2010 11:56 AM |