View Single Post
Old 01-03-2005, 03:52 AM   #7
Laurens
Jah Blessed
Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.Laurens is no ebook tyro.
 
Laurens's Avatar
 
Posts: 1,295
Karma: 1373
Join Date: Apr 2003
Location: The Netherlands
Device: iPod Touch
Quote:
Originally Posted by ENIX
Sorry to bother you again. When I try to catch some chinese web page, I got messy codes sometimes. After checking the souce code of the web page, I found that this problem exists when there is no "<meta http-equiv="Content-Type" content="text/html; charset=gb2312">" in the source code. So could you please improve the sunrise's abillity to detect the charset of the web page? Thanks again....
Well, detecting the correct charset just by looking at the raw bytes, without a <meta> tag or Content-Type HTTP header, is far from easy. I tried using the Java port of the Mozilla charset detector way back in Sunrise 0.1 and found that in the majority of cases it fails to detect the correct charset. The only way to resolve this is to tell Sunrise explicitly which charset the page is using. Will add this option in the next prerelease.
Laurens is offline