View Single Post
Old 03-24-2011, 06:24 PM   #19
oneillpt
Connoisseur
oneillpt began at the beginning.
 
Posts: 63
Karma: 46
Join Date: Feb 2011
Device: Kindle 3 (cracked screen!); PW1; Oasis
Quote:
Originally Posted by bthoven View Post
Hi oneillpt,

Just try your latest script, Calibre got the content without error. As you said, Calibre will crash when trying to open the content.

I don't know why they are still using 874 codepage, instead of others which are more popular.

The thai.mobi/epub display Thai correctly.

Thanks again for your kind help.
And now the solution: it just needs one line added to the recipe, specifying the encoding. encoding = 'cp874'

So the recipe now starts with:
Code:
from calibre.web.feeds.news import BasicNewsRecipe
from calibre.ebooks.BeautifulSoup import Tag, NavigableString

class thai(BasicNewsRecipe):

    title      = u'thai'
    __author__ = u'oneillpt'
    #masthead_url = 'http://www.elpais.com/im/tit_logo_int.gif'
    #  (you may want to select a masthead image from your source here)
    INDEX = 'http://www.naewna.com/allnews.asp?ID=79'
    encoding              = 'cp874'
    language = 'th_TH'
    oldest_article = 7
    max_articles_per_feed = 2
and then continues as before. The MOBI output now shows proper Thai text. The formatting may need further work, but this is as far as I can go. Not being able to read Thai, all I can add is that the text looks centred, but probably should not be centred.
oneillpt is offline   Reply With Quote