View Single Post
Old 01-08-2012, 09:34 AM   #86
eureka
but forgot what it's like
eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.
 
Posts: 741
Karma: 2345678
Join Date: Dec 2011
Location: north (by northwest)
Device: Kindle Touch
Quote:
Originally Posted by ixtab View Post
... don't import anything yet, until we have cleared the UTF-8 issue

I did some further testing, and these are the results:

- to change a resource on transifex from PROPERTIES to MOZILLAPROPERTIES, (it seems like ) it has to be removed and re-created. Removing a resource loses all associated translations, so we MUST make a backup of everything first.
- as expected, Java does not support UTF-8 properties out-of-the-box, but it should be possible to integrate this into the tool.

So, if we want to go for this, the workflow would be:
1. update the tool to assume everything is UTF-8. (ixtab) [this would not affect the extract part, but only the compile part -- or am I wrong?]
2. once the tool is ready, make a backup of current translation state, wipe all resources, upload result of extraction as new MOZILLAPROPERTIES resources, convert existing translations, re-upload existing translations. (eureka)

Is this correct, and should we go for it? I have "assigned" 2. to you, but I'm fine to help with the conversion part (i.e., to write some kind of tool to convert a .properties file from PROPERTIES to MOZILLAPROPERTIES format, aka from ISO-8859-1 to UTF-8).

Let me know...
OK, good plan. I'm fine with assigned task and converting from Uncode escaped sequences to UTF-8 looks not so hard with Python, so I'll do it.

It would be better if extract part will also produce UTF-8 output. While practically it is superfluous (as only en_US resources will be taken from extract result and these resources are the same in ISO-8859-1 and UTF-8 variants), it will be more consistent and more error-prone in case of my (or someone else's) error, if localized resource will leak from extract result to Git repo and further to Transifex.
eureka is offline   Reply With Quote