@kovid: I am looking at it now and I am confident that it does what you intend, but the approach makes me nervous for several reasons:
- From what I see via Google, the "smart quote" stuff only happens when you are typing, but doesn't happen when you paste. As such there is no guarantee that the quotes are balanced. Example: if I paste comments:" then type foo" then I will get comments:"foo”. The parser will fail.
- Because the change is in the parser it affects all searches even if the source doesn't use smart quotes, such as the calibre app search bar and OPDS clients. Similar things might happen with machine-generated searches.
- It changes the meaning of some searches. For example, today I can search for comments:“foo” and find “foo”. With this change I must search for comments:"“foo”".
- iOS apparently also messes around with dashes, automatically converting double hyphens (--) to dashes (—). Some people say that just for fun iOS tosses in 0x00 bytes after them.
- Apparently Apple is doing the same thing with other locales, for example changing "foo" to « foo » in a French locale or „foo“ in Germany.
Conclusion: changing the search/query parser introduces unexpected behavior and doesn't solve the problem for non-English locales. Instead it creates a never-ending headache as Apple continues to mess with it.
I think it would be better to deal with the problem directly in the content server, replacing "smart" quotes if the user agent is an iOS (or perhaps any Apple) device. That way you can:
- limit the damage to one input source (content server searching)
- limit the damage to one user agent if it can be trusted.
- use a brute force replacement algorithm instead of trying to balance the things. By this I mean replace all “ and ” with " and so on.
- deal with converting m-dashes back to -- (or not)
- take locale into consideration (or not).
- do the dirty in the browser or in the content server itself, whichever is better.
BTW:
According to this thread, it is possible to turn off smart quotes on a field-by-field basis. I don't know if that level of control is made visible somehow to javascript in Safari.