View Single Post
Old 08-24-2011, 03:19 AM   #111
kiwidude
Calibre Plugins Developer
kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.kiwidude ought to be getting tired of karma fortunes by now.
 
Posts: 4,642
Karma: 2162064
Join Date: Oct 2010
Location: Australia
Device: Kindle Oasis
It only blows up on certain combinations. This is more of the code I originally posted (slightly adapted to try to make sense as a snippet) - it is unchanged from the zip downloadable from this thread:
Code:
encoding = 'utf-8' or 'latin-1' based on configured website URL
vals = mi.all_non_none_fields()
fixed_vals = {}
for k in vals:
    fixed_vals[k] = unicode(vals[k]) # convert non-string types
    fixed_vals[k] = self.convert_to_search_text(fixed_vals[k], encoding)
    # self.convert_to_search_text() effectively does this:
    # text = quote_plus(text.encode(encoding, 'ignore'))

# Substitute our quoted, encoded values into out tokenised_url (which "might" contain quoted chars)
url = template_formatter.safe_format(tokenised_url, fixed_vals, 'STI template error', mi)

# Next line blows up if Author was Nieznany wyjątek
# url at this point is: http://www.google.com/#sclient=psy&q=%22Nieznany wyjÄ…tek%22+%22Unknown%22
open_url(QUrl.fromEncoded(url))
If on that template_formatter.safe_format line I add .encode(ascii, 'ignore') to the end, then as I mentioned above QUrl.fromEncoded will not blow up - but it also destroys certain characters.

So in answer to your question - no it is not pure ascii characters by the time it reaches QUrl.fromEncoded, and it cannot be or else the original content is corrupted.

Note that if I avoid QUrl completely and replace the last line above with webbrowser.open(url), then it works perfectly fine on EVERY case of inputs. The URL is already exactly how I want it to be passed to the webbrowser object. It is QUrl which is "mangling it". I can't use QUrl(url) because QUrl tries to be too clever and do it's own quote substitutions it spots of characters - I can't give it the "raw" URL, and .fromEncoded() is useless because of it's own latin-1 decoding it is doing internally that blows up for non-ascii. I also note that in some cases even though it does not blow up, it also doesn't give the right result in the web browser. QUrl sucks.

So unless you have any bright ideas I am going back to webbrowser.open(). That won't please the Linux users (not that I care personally not being one of them but there are obviously a few out there). The only thing I could do is have an if is_linux statement to invoke the existing open_url() with the understanding that it doesn't work for any non-ascii names. Which is a crappy workaround.

Last edited by kiwidude; 08-24-2011 at 04:14 AM.
kiwidude is offline   Reply With Quote