01-08-2012, 04:57 AM | #76 | |
but forgot what it's like
Posts: 741
Karma: 2345678
Join Date: Dec 2011
Location: north (by northwest)
Device: Kindle Touch
|
Quote:
Tool's output, however, brings some difficulties to checking differencies with 'diff -r'. Firstly, line endings in my output are '\r\n', while in yours they are '\n'. I'm on Windows, as you can guess. I think it'll be useful, if line endings will be uniform: '\n'. Could you implement it? Secondly, timestamp in header comment. Could you remove it from output? Also, could you output in UTF-8 without \uXXXX escape sequences? Overall, looks like I'm far behind you in progress of my JS tool. I'll try to speed up the work. I'm now refactoring it to fit parsed JS into nice internal data structures and planning to use XLIFF format for output. XLIFF is a XML, so I can attach metadata (like, 'it should be wrapped into MessageFormat constructor call') to translatable strings. About uploading to transifex. I've made it with some throw-away scripts. One for scanning of src/5.0.0/framework for .properties and crafting .tx/config. Then I've manually changed all slugs of resources in this config (slug's length shouldn't exceed limit of 50 characters, and I wanted to put sensible abbreviation of 'com.amazon.kindle.<...>' into these 50 characters). Then 'tx push'. For displaying nice names of resources in Transifex WebUI, I've made another script which changed resources's names with using of Transifex API. I've deleted these scripts, but surely I can make them again. I'll wait for your response about line endings in tool's output before importing this output into repository and Transifex. |
|
01-08-2012, 05:32 AM | #77 | |||
(offline)
Posts: 2,907
Karma: 6736092
Join Date: Dec 2011
Device: K3, K4, K5, KPW, KPW2
|
Thanks for checking it!
Quote:
- the line endings and timestamp are produced by Properties.store(). There certainly *could* be a way of changing the line endings or removing the timestamp comment, but honestly: is it really a problem? - I'm not outputting escape sequences. Again, it's what Java Properties.store() produces. properties files are encoded in ISO-8859-1, so escaping UTF-8 is actually the proper way. Some of the originally localized files include yet another way of escaping (using octal escapes, if I'm not mistaken). These escapes are hard-coded in the original translations. As said, none of this *should* matter. The source files themselves only serve as input to transifex anyway. The translated files are the only ones actually being "packed" (as .properties or as .class, depending on what is required or makes more sense). In any case, the only thing that really matters IMO is whether transifex can make sense of the inputs (as produced by "extract"), and whether locale files (as produced by "compile") display properly on the Kindle. Both of these should be the case. Quote:
Quote:
|
|||
Advert | |
|
01-08-2012, 06:38 AM | #78 | |
but forgot what it's like
Posts: 741
Karma: 2345678
Join Date: Dec 2011
Location: north (by northwest)
Device: Kindle Touch
|
Quote:
then Git will show that all .properties files were changed because, well, all line endings were changed. All lines will appear in 'git diff' output, so it will be of no use. OK, line endings could be no problem with 'git config --global core.autocrlf input' (as I've just found out), but timestamp changes will pollute 'git diff' output even on small changes in only one .properties file. Also, using of UTF-8 instead \uXXXX will be useful for reader of translated .properties files taken straight from Git repo. I'm saying as Russian-speaking user. All Russian strings (using cyrillic alphabet) will be represented as long sequences of \uXXXX. It will be hard to fix some translation errors from plain old text editor (while using transifex command-line client). But, then, I've thought that you are controlling output of your tool, and also, that .properties is only intermediate format for translating which is compiled back into .class in the end. So I've made some suggestions on how to improve tool's output. But it seems you're using Java-provided functions for outputting data into .properties files and also put some of these .properties back into bundle on compiling, so inablility of change .properties format makes sense for me now. So I'm not insisting on changing it and will proceed to commit it into Git repo and Transifex (in a day or so). |
|
01-08-2012, 06:49 AM | #79 |
(offline)
Posts: 2,907
Karma: 6736092
Join Date: Dec 2011
Device: K3, K4, K5, KPW, KPW2
|
After giving this a second thought, you're right about the line endings and timestamp, so I'll see if there is anything I can do about it. About UTF-8, I fear that there is no way to support it. properties files MUST be in ISO-8859-1. Storing them in UTF-8 directly will result in garbage on the reader.
|
01-08-2012, 07:10 AM | #80 | ||
but forgot what it's like
Posts: 741
Karma: 2345678
Join Date: Dec 2011
Location: north (by northwest)
Device: Kindle Touch
|
Quote:
Quote:
Am I right in principle? Could you take a look into this? |
||
Advert | |
|
01-08-2012, 07:43 AM | #81 | |
(offline)
Posts: 2,907
Karma: 6736092
Join Date: Dec 2011
Device: K3, K4, K5, KPW, KPW2
|
Just committed the changes. Please test-drive the new version and let me know if it works (I don't have Windows around...).
Quote:
- localized property files come from transifex (tx pull), not from the extraction tool. - I am not converting every single properties file into a class, but only some. There is a very simple heuristic behind the selection: i) if a properties file contains arrays, then necessarily create a class from it; ii) otherwise, temporarily create a class, then bundle whichever file is smaller (the original properties file, or the class file). This could of course be changed to make everything into a class, but this will normally bloat the size of the .jar (factor ~ 2 while testing the de locale). In addition, the classes are not terribly efficient (they internally use a Base64-encoded version of the serialized String of the contents, which must be decoded and deserialized when the class is loaded). Of course there are no benchmarks around, but I guess that this may be slower than directly using .properties. So to summarize, the resulting jar contains a mix of .properties and .class files. |
|
01-08-2012, 08:03 AM | #82 |
(offline)
Posts: 2,907
Karma: 6736092
Join Date: Dec 2011
Device: K3, K4, K5, KPW, KPW2
|
One more addition -- just thinking aloud about what would happen if we switched to UTF-8 properties:
- the advantage would be that editing files locally becomes easier. - the tool would need to be rewritten to initially produce UTF-8 files, and to cope with UTF-8 in translated files. (rewriting into ISO-8859-1 if necessary) - most importantly though: What would happen to *already existing* translations at transifex? We have steady progress for quite a few languages there, and I guess translators will be pissed off in case everything suddenly goes back to zero. This is the worst-case scenario, and I hope it wouldn't happen, but I'm not sure how to properly test this. |
01-08-2012, 08:13 AM | #83 | |||
but forgot what it's like
Posts: 741
Karma: 2345678
Join Date: Dec 2011
Location: north (by northwest)
Device: Kindle Touch
|
Quote:
Quote:
(BTW, just for the reference: resources with type=MOZILLAPROPERTIES are exported in UTF-8 from Transifex.) Quote:
|
|||
01-08-2012, 08:25 AM | #84 | |
but forgot what it's like
Posts: 741
Karma: 2345678
Join Date: Dec 2011
Location: north (by northwest)
Device: Kindle Touch
|
Quote:
But, of course, there will be some roughnesses in transition from PROPERTIES to MOZILLAPROPERTIES (where values are stored in UTF-8), so your point is completely valid here. Though, I think these roughnesses are manageable. Last edited by eureka; 01-08-2012 at 08:27 AM. Reason: clarify choices for recovery from script error |
|
01-08-2012, 09:00 AM | #85 |
(offline)
Posts: 2,907
Karma: 6736092
Join Date: Dec 2011
Device: K3, K4, K5, KPW, KPW2
|
... don't import anything yet, until we have cleared the UTF-8 issue
I did some further testing, and these are the results: - to change a resource on transifex from PROPERTIES to MOZILLAPROPERTIES, (it seems like ) it has to be removed and re-created. Removing a resource loses all associated translations, so we MUST make a backup of everything first. - as expected, Java does not support UTF-8 properties out-of-the-box, but it should be possible to integrate this into the tool. So, if we want to go for this, the workflow would be: 1. update the tool to assume everything is UTF-8. (ixtab) [this would not affect the extract part, but only the compile part -- or am I wrong?] 2. once the tool is ready, make a backup of current translation state, wipe all resources, upload result of extraction as new MOZILLAPROPERTIES resources, convert existing translations, re-upload existing translations. (eureka) Is this correct, and should we go for it? I have "assigned" 2. to you, but I'm fine to help with the conversion part (i.e., to write some kind of tool to convert a .properties file from PROPERTIES to MOZILLAPROPERTIES format, aka from ISO-8859-1 to UTF-8). Let me know... |
01-08-2012, 09:34 AM | #86 | |
but forgot what it's like
Posts: 741
Karma: 2345678
Join Date: Dec 2011
Location: north (by northwest)
Device: Kindle Touch
|
Quote:
It would be better if extract part will also produce UTF-8 output. While practically it is superfluous (as only en_US resources will be taken from extract result and these resources are the same in ISO-8859-1 and UTF-8 variants), it will be more consistent and more error-prone in case of my (or someone else's) error, if localized resource will leak from extract result to Git repo and further to Transifex. |
|
01-08-2012, 09:57 AM | #87 | |
(offline)
Posts: 2,907
Karma: 6736092
Join Date: Dec 2011
Device: K3, K4, K5, KPW, KPW2
|
Quote:
|
|
01-08-2012, 01:55 PM | #88 |
(offline)
Posts: 2,907
Karma: 6736092
Join Date: Dec 2011
Device: K3, K4, K5, KPW, KPW2
|
Current status:
- "extract" now produces UTF-8 files. - "compile" now expects UTF-8 files. - A new mode named "iso2utf" has been added to the tool. This was quickly written up after the previous discussions (Well, it's 50 LOC, not 20, but still...) This mode can be used for converting existing localizations. Sample usage is "java -jar kt-l10n.jar iso2utf -f -s com/ -t com/". This would convert all .properties files inside com/ from ISO-8859-1 to UTF-8. Source and target directories must be the same; in other words, files are updated in place. DO NOT run this more than once on the same directory, or the output WILL become bullshit. (It's converted from ISO-8559-1 to UTF-8 on the first run; it would be converted to something meaningless on the second run). The current version has been checked in. I've tested the entire round-trip with the german translation, and it works for me. I also tested with some dummy russian file, and it seems to work as well. (try it... "pаз, два, тpи, четыpе", "блядь... не делает" ;-) ). So we should be ok to move to UTF-8. |
01-08-2012, 02:45 PM | #89 | |
but forgot what it's like
Posts: 741
Karma: 2345678
Join Date: Dec 2011
Location: north (by northwest)
Device: Kindle Touch
|
Quote:
(Your last Russian phrase is so much funny because one of the words is obscene [it's generally used for expressing of extreme frustration] and unexpected here and phrase doesn't fit in context of your whole message. I'm wondering what it was originally before translating...) |
|
01-08-2012, 03:12 PM | #90 | |
(offline)
Posts: 2,907
Karma: 6736092
Join Date: Dec 2011
Device: K3, K4, K5, KPW, KPW2
|
Quote:
|
|
Thread Tools | Search this Thread |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Kindle 3 localization | JirkaS | Kindle Developer's Corner | 287 | 05-20-2018 10:08 AM |
[K3] Physical keyboard localization | Sir Alex | Kindle Developer's Corner | 112 | 05-19-2018 11:23 PM |
Kindle 4 (no touch) GUI Localization | Sir Alex | Kindle Developer's Corner | 43 | 09-13-2013 07:19 AM |
Keyboard localization (hack) | Sir Alex | Kindle Developer's Corner | 72 | 04-16-2013 03:05 PM |
Kindle 3, Nook Simple Touch, Kobo Touch and Libra Pro Touch | jbcohen | Which one should I buy? | 4 | 06-18-2011 07:58 PM |