Quote:
Originally Posted by bthoven
Hi oneilpt,
I tried to fetch the news by using your script, here is the error on my side, not sure what to do next:
calibre, version 0.7.48
ERROR: Conversion Error: <b>Failed</b>: Fetch news from Jermsak_Naewna
Fetch news from Jermsak_Naewna
Resolved conversion options
calibre version: 0.7.48
...
--> class: style4 style15
Python function terminated unexpectedly
'class' (Error Code: 1)
Traceback (most recent call last):
File "site.py", line 103, in main
File "site.py", line 85, in run_entry_point
File "site-packages\calibre\utils\ipc\worker.py", line 110, in main
File "site-packages\calibre\gui2\convert\gui_conversion.py", line 25, in gui_convert
File "site-packages\calibre\ebooks\conversion\plumber.py", line 904, in run
File "site-packages\calibre\customize\conversion.py", line 204, in __call__
File "site-packages\calibre\web\feeds\input.py", line 105, in convert
File "site-packages\calibre\web\feeds\news.py", line 734, in download
File "site-packages\calibre\web\feeds\news.py", line 871, in build_index
File "c:\users\chotec~1\appdata\local\temp\calibre_0.7. 48_tmp_bm8qsi\calibre_0.7.48_spw2ws_recipes\recipe 0.py", line 55, in parse_index
klass = post['class']
File "site-packages\calibre\ebooks\BeautifulSoup.py", line 518, in __getitem__
KeyError: 'class'
|
Found a change to the source page caused a similar problem for me today. The revised recipe below fixed this. Looking at your log though I see the same "diamond" invalid characters which I get, whereas the log from the built-in Thai recipes shows proper Thai characters. Try this revised recipe anyway and see if the book looks right, other than the corrupted text. If it does, then the next step is to report the character encoding problem. It still crashes the Calibre reader, but can be viewed in MobiPocket Reader.
I also built the e-book under Ubuntu Linux to see if the problem was specific to Windows. The same "diamond" invalid characters appeared, but the e-book in this case did not crash the Calibre reader. The images however were not visible in the e-book in the Calibre reader, whereas they were visible in the MobiPocket Reader under Windows.
Code:
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import Tag, NavigableString
class thai(BasicNewsRecipe):
title = u'thai'
__author__ = u'oneillpt'
#masthead_url = 'http://www.elpais.com/im/tit_logo_int.gif'
INDEX = 'http://www.naewna.com/allnews.asp?ID=79'
language = 'th'
#remove_tags_before = dict(name='div', attrs={'class':'estructura_2col'})
#keep_tags = [dict(name='div', attrs={'class':'estructura_2col'})]
#remove_tags = [dict(name='div', attrs={'class':'votos estirar'}),
#dict(name='div', attrs={'id':'utilidades'}),
#dict(name='div', attrs={'class':'info_relacionada'}),
#dict(name='div', attrs={'class':'mod_apoyo'}),
#dict(name='div', attrs={'class':'contorno_f'}),
#dict(name='div', attrs={'class':'pestanias'}),
#dict(name='div', attrs={'class':'otros_webs'}),
#dict(name='div', attrs={'id':'pie'})
#]
no_stylesheets = True
remove_javascript = True
def parse_index(self):
articles = []
soup = self.index_to_soup(self.INDEX)
cover = None
feeds = []
for section in soup.findAll('body'):
section_title = self.tag_to_string(section.find('h1'))
z = section.find('td', attrs={'background':'images/fa04.gif'})
self.log('z', z)
x = z.find('font')
self.log('x', x)
y = x.find('strong')
self.log('y', y)
section_title = self.tag_to_string(y)
self.log('section_title(1): ', section_title)
if section_title == "":
section_title = u'Thai Feed'
self.log('section_title(2): ', section_title)
articles = []
for post in section.findAll('a', href=True):
self.log('--> p: ', post)
url = post['href']
self.log('--> u: ', url)
if url.startswith('n'):
url = 'http://www.naewna.com/'+url
self.log('--> u: ', url)
title = self.tag_to_string(post)
self.log('--> t: ', title)
if str(post).find('class="style4 style15"') > 0:
klass = post['class']
self.log('--> k: ', klass)
if klass == "style4 style15":
self.log()
self.log('--> post: ', post)
self.log('--> url: ', url)
self.log('--> title: ', title)
self.log('--> class: ', klass)
articles.append({'title':title, 'url':url})
if articles:
feeds.append((section_title, articles))
return feeds