Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 08-04-2017, 11:49 AM   #1
hiperlink
Enthusiast
hiperlink began at the beginning.
 
Posts: 41
Karma: 10
Join Date: Dec 2010
Device: Kindle 3 Wifi only
Private recipe repeatedly fails with BeautifulSoup find (calibre 3.6)

Hi All,

at first thanks for your help and for Kovid for creating Calibre.

I had created a private recipe for a relatively small Hungarian newspaper/site (behind a paywall).

My recipe was working perfectly two weeks ago, but now I got back from vacation and just updated Calibre and now it fails with:

Code:
Traceback (most recent call last):
  File "site-packages/calibre/web/fetch/simple.py", line 553, in process_links
  File "site-packages/calibre/web/feeds/news.py", line 992, in _postprocess_html
  File "<string>", line 53, in postprocess_html
AttributeError: 'NoneType' object has no attribute 'replace'
My postprocess_html begins with:

Code:
    def postprocess_html(self, soup, first):
        html_title     = soup.find('title').string
        new_html_title = html_title.replace(u" | ÉLET ÉS IRODALOM","")
And the replace part can't be executed on None(Type). However I had logged the soup and the html title (tag) is there.

Can you help me how to fix it? Thanks in advance,
Zsolt
hiperlink is offline   Reply With Quote
Advert
Old 08-04-2017, 11:54 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 32,222
Karma: 9820640
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Check the soup again, as of calibre 3.5 HTML is now parsed using the HTML 5 parsing algorithm, not beautifulsoup's parser. So probably the different parsing algorithm is yielding different results.
kovidgoyal is offline   Reply With Quote
Old 08-04-2017, 12:00 PM   #3
hiperlink
Enthusiast
hiperlink began at the beginning.
 
Posts: 41
Karma: 10
Join Date: Dec 2010
Device: Kindle 3 Wifi only
Thanks for the quick reply Kovid!

Just modified my recipe to have:

Code:
    def postprocess_html(self, soup, first):
        html_title     = soup.find('title').string
        if not html_title:
            self.log("BINGO: ", html_title)
            self.log(soup)
And the relevant output is:
Code:
BINGO:  None
<html><head><title>Csakis a közjó | ÉLET ÉS IRODALOM</title><style type="text/css" title="override_css">
            .article_date {
                color: gray; font-family: monospace;
            }
### SNIPPED...
Traceback (most recent call last):
  File "site-packages/calibre/web/fetch/simple.py", line 553, in process_links
  File "site-packages/calibre/web/feeds/news.py", line 992, in _postprocess_html
  File "<string>", line 51, in postprocess_html
AttributeError: 'NoneType' object has no attribute 'replace'
So the title is there...
hiperlink is offline   Reply With Quote
Old 08-04-2017, 01:05 PM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 32,222
Karma: 9820640
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
So what is line 51 in your recipe?
kovidgoyal is offline   Reply With Quote
Old 08-04-2017, 01:37 PM   #5
hiperlink
Enthusiast
hiperlink began at the beginning.
 
Posts: 41
Karma: 10
Join Date: Dec 2010
Device: Kindle 3 Wifi only
It is: new_html_title = html_title.replace(u" | ÉLET ÉS IRODALOM","").

In my original post the html_title is soup.find('title'). And as in the debug, <title> is there.
hiperlink is offline   Reply With Quote
Advert
Old 08-04-2017, 01:47 PM   #6
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 32,222
Karma: 9820640
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Use

self.tag_to_string(soup.find('title'))
kovidgoyal is offline   Reply With Quote
Old 08-06-2017, 08:56 AM   #7
hiperlink
Enthusiast
hiperlink began at the beginning.
 
Posts: 41
Karma: 10
Join Date: Dec 2010
Device: Kindle 3 Wifi only
That's "just works" thanks for your support Kovid!
hiperlink is offline   Reply With Quote
Reply

Tags
recipe

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
NY Times fails repeatedly NSILMike Recipes 5 03-02-2017 03:46 PM
Linking fails on fedora17: cannot find boost::system::system_category Dyspeptica Sigil 4 10-14-2012 11:27 PM
Recipe works when mocked up as Python file, fails when converted to Recipe ode Recipes 7 09-04-2011 05:57 AM
Calibre epub from recipe fails in Sigil and FBReader on Android siebert Calibre 15 12-04-2010 12:18 PM
NY Times Recipe in Calibre 6.36 Fails keyrunner Calibre 1 01-28-2010 12:56 PM


All times are GMT -4. The time now is 09:36 PM.


MobileRead.com is a privately owned, operated and funded community.