Quote:
Originally Posted by bthoven
Hi oneillpt,
Just try your latest script, Calibre got the content without error. As you said, Calibre will crash when trying to open the content.
I don't know why they are still using 874 codepage, instead of others which are more popular.
The thai.mobi/epub display Thai correctly.
Thanks again for your kind help.
|
And now the solution: it just needs one line added to the recipe, specifying the encoding. encoding = 'cp874'
So the recipe now starts with:
Code:
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import Tag, NavigableString
class thai(BasicNewsRecipe):
title = u'thai'
__author__ = u'oneillpt'
#masthead_url = 'http://www.elpais.com/im/tit_logo_int.gif'
# (you may want to select a masthead image from your source here)
INDEX = 'http://www.naewna.com/allnews.asp?ID=79'
encoding = 'cp874'
language = 'th_TH'
oldest_article = 7
max_articles_per_feed = 2
and then continues as before. The MOBI output now shows proper Thai text. The formatting may need further work, but this is as far as I can go. Not being able to read Thai, all I can add is that the text looks centred, but probably should not be centred.