Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Readers > Amazon Kindle > Kindle Developer's Corner

Notices

Reply
 
Thread Tools Search this Thread
Old 01-08-2012, 04:57 AM   #76
eureka
but forgot what it's like
eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.
 
Posts: 741
Karma: 2345678
Join Date: Dec 2011
Location: north (by northwest)
Device: Kindle Touch
Quote:
Originally Posted by ixtab View Post
I have not taken the final step of updating the source .properties on transifex. eureka, Could you please verify that the output of "extract" makes sense before we go anywhere else? (BTW, this would also be a possibility to check for differences between 5.0.0 and 5.0.1. I tried the md5sum thing, but it's pretty useless because md5sums of almost all files have changed. I'm attaching the output of "extract" for 5.0.1.) Then the only thing left to do is to actually update the source files for transifex... Could you please do that? (I don't know how you managed to automatically update the .tx/config and have the files appear on transifex with the correct name etc.)
I've ran this tool on 5.0.0 resources: tool's output is attached. No resources were changed between v5.0.0 and v5.0.1, as far as I can see. You could recheck, if you wish.

Tool's output, however, brings some difficulties to checking differencies with 'diff -r'.

Firstly, line endings in my output are '\r\n', while in yours they are '\n'. I'm on Windows, as you can guess. I think it'll be useful, if line endings will be uniform: '\n'. Could you implement it?

Secondly, timestamp in header comment. Could you remove it from output?

Also, could you output in UTF-8 without \uXXXX escape sequences?

Overall, looks like I'm far behind you in progress of my JS tool. I'll try to speed up the work. I'm now refactoring it to fit parsed JS into nice internal data structures and planning to use XLIFF format for output. XLIFF is a XML, so I can attach metadata (like, 'it should be wrapped into MessageFormat constructor call') to translatable strings.

About uploading to transifex. I've made it with some throw-away scripts. One for scanning of src/5.0.0/framework for .properties and crafting .tx/config. Then I've manually changed all slugs of resources in this config (slug's length shouldn't exceed limit of 50 characters, and I wanted to put sensible abbreviation of 'com.amazon.kindle.<...>' into these 50 characters). Then 'tx push'. For displaying nice names of resources in Transifex WebUI, I've made another script which changed resources's names with using of Transifex API.

I've deleted these scripts, but surely I can make them again. I'll wait for your response about line endings in tool's output before importing this output into repository and Transifex.
Attached Files
File Type: zip extract_500.zip (119.0 KB, 167 views)
eureka is offline   Reply With Quote
Old 01-08-2012, 05:32 AM   #77
ixtab
(offline)
ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.
 
ixtab's Avatar
 
Posts: 2,907
Karma: 6736092
Join Date: Dec 2011
Device: K3, K4, K5, KPW, KPW2
Thanks for checking it!
Quote:
Originally Posted by eureka View Post
I've ran this tool on 5.0.0 resources: tool's output is attached. No resources were changed between v5.0.0 and v5.0.1, as far as I can see. You could recheck, if you wish.

Tool's output, however, brings some difficulties to checking differencies with 'diff -r'.

Firstly, line endings in my output are '\r\n', while in yours they are '\n'. I'm on Windows, as you can guess. I think it'll be useful, if line endings will be uniform: '\n'. Could you implement it?

Secondly, timestamp in header comment. Could you remove it from output?

Also, could you output in UTF-8 without \uXXXX escape sequences?
All of these things come from Java itself, and none of them should really matter.
- the line endings and timestamp are produced by Properties.store(). There certainly *could* be a way of changing the line endings or removing the timestamp comment, but honestly: is it really a problem?
- I'm not outputting escape sequences. Again, it's what Java Properties.store() produces. properties files are encoded in ISO-8859-1, so escaping UTF-8 is actually the proper way. Some of the originally localized files include yet another way of escaping (using octal escapes, if I'm not mistaken). These escapes are hard-coded in the original translations.

As said, none of this *should* matter. The source files themselves only serve as input to transifex anyway. The translated files are the only ones actually being "packed" (as .properties or as .class, depending on what is required or makes more sense). In any case, the only thing that really matters IMO is whether transifex can make sense of the inputs (as produced by "extract"), and whether locale files (as produced by "compile") display properly on the Kindle. Both of these should be the case.


Quote:
Originally Posted by eureka View Post
Overall, looks like I'm far behind you in progress of my JS tool. I'll try to speed up the work. I'm now refactoring it to fit parsed JS into nice internal data structures and planning to use XLIFF format for output. XLIFF is a XML, so I can attach metadata (like, 'it should be wrapped into MessageFormat constructor call') to translatable strings.
No problem :-) -- this is volunteer work after all. And if we get the Java part onto transifex, there will again be enough work for translators anyway . But it's good to see we're progressing...

Quote:
Originally Posted by eureka View Post
About uploading to transifex. I've made it with some throw-away scripts. One for scanning of src/5.0.0/framework for .properties and crafting .tx/config. Then I've manually changed all slugs of resources in this config (slug's length shouldn't exceed limit of 50 characters, and I wanted to put sensible abbreviation of 'com.amazon.kindle.<...>' into these 50 characters). Then 'tx push'. For displaying nice names of resources in Transifex WebUI, I've made another script which changed resources's names with using of Transifex API.

I've deleted these scripts, but surely I can make them again. I'll wait for your response about line endings in tool's output before importing this output into repository and Transifex.
That'd be really nice, yeah. Please make sure to also check in the tools then ;-) (about the line endings, see above. I don't think they're a problem, but if you disagree, let me know and I'll see what I can do).
ixtab is offline   Reply With Quote
Advert
Old 01-08-2012, 06:38 AM   #78
eureka
but forgot what it's like
eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.
 
Posts: 741
Karma: 2345678
Join Date: Dec 2011
Location: north (by northwest)
Device: Kindle Touch
Quote:
Originally Posted by ixtab View Post
Thanks for checking it!

All of these things come from Java itself, and none of them should really matter.
- the line endings and timestamp are produced by Properties.store(). There certainly *could* be a way of changing the line endings or removing the timestamp comment, but honestly: is it really a problem?
- I'm not outputting escape sequences. Again, it's what Java Properties.store() produces. properties files are encoded in ISO-8859-1, so escaping UTF-8 is actually the proper way. Some of the originally localized files include yet another way of escaping (using octal escapes, if I'm not mistaken). These escapes are hard-coded in the original translations.

As said, none of this *should* matter. The source files themselves only serve as input to transifex anyway. The translated files are the only ones actually being "packed" (as .properties or as .class, depending on what is required or makes more sense). In any case, the only thing that really matters IMO is whether transifex can make sense of the inputs (as produced by "extract"), and whether locale files (as produced by "compile") display properly on the Kindle. Both of these should be the case.
Line endings and timestamp in header comment matters in Git repo workflow. If
  • I'll commit these files as-is with '\r\n'
  • and then you'll find out that some translatable properties are missing in only one existing .properties file (or whatever that will require to change content of commited .properties)
  • and you'll run your tool again
  • and you'll add it's whole output with '\n' (instead of adding only changed .properties file)

then Git will show that all .properties files were changed because, well, all line endings were changed. All lines will appear in 'git diff' output, so it will be of no use.

OK, line endings could be no problem with 'git config --global core.autocrlf input' (as I've just found out), but timestamp changes will pollute 'git diff' output even on small changes in only one .properties file.

Also, using of UTF-8 instead \uXXXX will be useful for reader of translated .properties files taken straight from Git repo. I'm saying as Russian-speaking user. All Russian strings (using cyrillic alphabet) will be represented as long sequences of \uXXXX. It will be hard to fix some translation errors from plain old text editor (while using transifex command-line client).

But, then, I've thought that you are controlling output of your tool, and also, that .properties is only intermediate format for translating which is compiled back into .class in the end. So I've made some suggestions on how to improve tool's output. But it seems you're using Java-provided functions for outputting data into .properties files and also put some of these .properties back into bundle on compiling, so inablility of change .properties format makes sense for me now. So I'm not insisting on changing it and will proceed to commit it into Git repo and Transifex (in a day or so).
eureka is offline   Reply With Quote
Old 01-08-2012, 06:49 AM   #79
ixtab
(offline)
ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.
 
ixtab's Avatar
 
Posts: 2,907
Karma: 6736092
Join Date: Dec 2011
Device: K3, K4, K5, KPW, KPW2
After giving this a second thought, you're right about the line endings and timestamp, so I'll see if there is anything I can do about it. About UTF-8, I fear that there is no way to support it. properties files MUST be in ISO-8859-1. Storing them in UTF-8 directly will result in garbage on the reader.
ixtab is offline   Reply With Quote
Old 01-08-2012, 07:10 AM   #80
eureka
but forgot what it's like
eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.
 
Posts: 741
Karma: 2345678
Join Date: Dec 2011
Location: north (by northwest)
Device: Kindle Touch
Quote:
Originally Posted by ixtab View Post
After giving this a second thought, you're right about the line endings and timestamp, so I'll see if there is anything I can do about it.
Thanks a lot. I appreciate it.

Quote:
Originally Posted by ixtab View Post
About UTF-8, I fear that there is no way to support it. properties files MUST be in ISO-8859-1. Storing them in UTF-8 directly will result in garbage on the reader.
I know that .properties must be in ISO-8859-1, when they are designed to be read by Java. But if you'll convert all .properties into .class on compiling locale bundle, then format of .properties will be of no matter for Java; it's contents of .class that will matters. And strings in .class could be UTF-8 AFAIK. (And Transifex supports UTF-8 in .properties with 'type=MOZILLAPROPERTIES').

Am I right in principle? Could you take a look into this?
eureka is offline   Reply With Quote
Advert
Old 01-08-2012, 07:43 AM   #81
ixtab
(offline)
ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.
 
ixtab's Avatar
 
Posts: 2,907
Karma: 6736092
Join Date: Dec 2011
Device: K3, K4, K5, KPW, KPW2
Quote:
Originally Posted by eureka View Post
Thanks a lot. I appreciate it.
Just committed the changes. Please test-drive the new version and let me know if it works (I don't have Windows around...).

Quote:
Originally Posted by eureka View Post
I know that .properties must be in ISO-8859-1, when they are designed to be read by Java. But if you'll convert all .properties into .class on compiling locale bundle, then format of .properties will be of no matter for Java; it's contents of .class that will matters. And strings in .class could be UTF-8 AFAIK. (And Transifex supports UTF-8 in .properties with 'type=MOZILLAPROPERTIES').

Am I right in principle? Could you take a look into this?
I guess you're right in principle. But (there are two "but"s):
- localized property files come from transifex (tx pull), not from the extraction tool.
- I am not converting every single properties file into a class, but only some. There is a very simple heuristic behind the selection: i) if a properties file contains arrays, then necessarily create a class from it; ii) otherwise, temporarily create a class, then bundle whichever file is smaller (the original properties file, or the class file). This could of course be changed to make everything into a class, but this will normally bloat the size of the .jar (factor ~ 2 while testing the de locale). In addition, the classes are not terribly efficient (they internally use a Base64-encoded version of the serialized String of the contents, which must be decoded and deserialized when the class is loaded). Of course there are no benchmarks around, but I guess that this may be slower than directly using .properties. So to summarize, the resulting jar contains a mix of .properties and .class files.
ixtab is offline   Reply With Quote
Old 01-08-2012, 08:03 AM   #82
ixtab
(offline)
ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.
 
ixtab's Avatar
 
Posts: 2,907
Karma: 6736092
Join Date: Dec 2011
Device: K3, K4, K5, KPW, KPW2
One more addition -- just thinking aloud about what would happen if we switched to UTF-8 properties:
- the advantage would be that editing files locally becomes easier.
- the tool would need to be rewritten to initially produce UTF-8 files, and to cope with UTF-8 in translated files. (rewriting into ISO-8859-1 if necessary)
- most importantly though: What would happen to *already existing* translations at transifex? We have steady progress for quite a few languages there, and I guess translators will be pissed off in case everything suddenly goes back to zero. This is the worst-case scenario, and I hope it wouldn't happen, but I'm not sure how to properly test this.
ixtab is offline   Reply With Quote
Old 01-08-2012, 08:13 AM   #83
eureka
but forgot what it's like
eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.
 
Posts: 741
Karma: 2345678
Join Date: Dec 2011
Location: north (by northwest)
Device: Kindle Touch
Quote:
Originally Posted by ixtab View Post
Just committed the changes. Please test-drive the new version and let me know if it works (I don't have Windows around...).
It works fine (output has '\n' line endings and no timestamp), thanks again.

Quote:
Originally Posted by ixtab View Post
I guess you're right in principle. But (there are two "but"s):
- localized property files come from transifex (tx pull), not from the extraction tool.
Yeah, you are right here. I was looking into tool output and saw there some _de resources with \uXXXX escapes (they are at com/amazon/kindle/kindlet/internal/developer/install, at least). But now I am thinking that these localized resources (extracted from Kindle) are of no value and shouldn't be imported into Transifex, so presence of \uXXXX escapes in them doesn't matter.

(BTW, just for the reference: resources with type=MOZILLAPROPERTIES are exported in UTF-8 from Transifex.)

Quote:
Originally Posted by ixtab View Post
- I am not converting every single properties file into a class, but only some. There is a very simple heuristic behind the selection: i) if a properties file contains arrays, then necessarily create a class from it; ii) otherwise, temporarily create a class, then bundle whichever file is smaller (the original properties file, or the class file). This could of course be changed to make everything into a class, but this will normally bloat the size of the .jar (factor ~ 2 while testing the de locale). In addition, the classes are not terribly efficient (they internally use a Base64-encoded version of the serialized String of the contents, which must be decoded and deserialized when the class is loaded). Of course there are no benchmarks around, but I guess that this may be slower than directly using .properties. So to summarize, the resulting jar contains a mix of .properties and .class files.
OK, then I'm fine with escaped Unicode characters. As I've said, expect importing of resources onto Transifex in a day or so.
eureka is offline   Reply With Quote
Old 01-08-2012, 08:25 AM   #84
eureka
but forgot what it's like
eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.
 
Posts: 741
Karma: 2345678
Join Date: Dec 2011
Location: north (by northwest)
Device: Kindle Touch
Quote:
Originally Posted by ixtab View Post
One more addition -- just thinking aloud about what would happen if we switched to UTF-8 properties:
- the advantage would be that editing files locally becomes easier.
- the tool would need to be rewritten to initially produce UTF-8 files, and to cope with UTF-8 in translated files. (rewriting into ISO-8859-1 if necessary)
- most importantly though: What would happen to *already existing* translations at transifex? We have steady progress for quite a few languages there, and I guess translators will be pissed off in case everything suddenly goes back to zero. This is the worst-case scenario, and I hope it wouldn't happen, but I'm not sure how to properly test this.
It could be possible to pull all translations from Transifex, commit current state, replace in localized .properties \uXXXX with UTF-8 characters (with some script) and push updated translations back. Even in a case of error in work of script noone translation wouldn't be lost as they will be stored in Git repo, so we could retry or even rollback

But, of course, there will be some roughnesses in transition from PROPERTIES to MOZILLAPROPERTIES (where values are stored in UTF-8), so your point is completely valid here. Though, I think these roughnesses are manageable.

Last edited by eureka; 01-08-2012 at 08:27 AM. Reason: clarify choices for recovery from script error
eureka is offline   Reply With Quote
Old 01-08-2012, 09:00 AM   #85
ixtab
(offline)
ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.
 
ixtab's Avatar
 
Posts: 2,907
Karma: 6736092
Join Date: Dec 2011
Device: K3, K4, K5, KPW, KPW2
... don't import anything yet, until we have cleared the UTF-8 issue

I did some further testing, and these are the results:

- to change a resource on transifex from PROPERTIES to MOZILLAPROPERTIES, (it seems like ) it has to be removed and re-created. Removing a resource loses all associated translations, so we MUST make a backup of everything first.
- as expected, Java does not support UTF-8 properties out-of-the-box, but it should be possible to integrate this into the tool.

So, if we want to go for this, the workflow would be:
1. update the tool to assume everything is UTF-8. (ixtab) [this would not affect the extract part, but only the compile part -- or am I wrong?]
2. once the tool is ready, make a backup of current translation state, wipe all resources, upload result of extraction as new MOZILLAPROPERTIES resources, convert existing translations, re-upload existing translations. (eureka)

Is this correct, and should we go for it? I have "assigned" 2. to you, but I'm fine to help with the conversion part (i.e., to write some kind of tool to convert a .properties file from PROPERTIES to MOZILLAPROPERTIES format, aka from ISO-8859-1 to UTF-8).

Let me know...
ixtab is offline   Reply With Quote
Old 01-08-2012, 09:34 AM   #86
eureka
but forgot what it's like
eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.
 
Posts: 741
Karma: 2345678
Join Date: Dec 2011
Location: north (by northwest)
Device: Kindle Touch
Quote:
Originally Posted by ixtab View Post
... don't import anything yet, until we have cleared the UTF-8 issue

I did some further testing, and these are the results:

- to change a resource on transifex from PROPERTIES to MOZILLAPROPERTIES, (it seems like ) it has to be removed and re-created. Removing a resource loses all associated translations, so we MUST make a backup of everything first.
- as expected, Java does not support UTF-8 properties out-of-the-box, but it should be possible to integrate this into the tool.

So, if we want to go for this, the workflow would be:
1. update the tool to assume everything is UTF-8. (ixtab) [this would not affect the extract part, but only the compile part -- or am I wrong?]
2. once the tool is ready, make a backup of current translation state, wipe all resources, upload result of extraction as new MOZILLAPROPERTIES resources, convert existing translations, re-upload existing translations. (eureka)

Is this correct, and should we go for it? I have "assigned" 2. to you, but I'm fine to help with the conversion part (i.e., to write some kind of tool to convert a .properties file from PROPERTIES to MOZILLAPROPERTIES format, aka from ISO-8859-1 to UTF-8).

Let me know...
OK, good plan. I'm fine with assigned task and converting from Uncode escaped sequences to UTF-8 looks not so hard with Python, so I'll do it.

It would be better if extract part will also produce UTF-8 output. While practically it is superfluous (as only en_US resources will be taken from extract result and these resources are the same in ISO-8859-1 and UTF-8 variants), it will be more consistent and more error-prone in case of my (or someone else's) error, if localized resource will leak from extract result to Git repo and further to Transifex.
eureka is offline   Reply With Quote
Old 01-08-2012, 09:57 AM   #87
ixtab
(offline)
ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.
 
ixtab's Avatar
 
Posts: 2,907
Karma: 6736092
Join Date: Dec 2011
Device: K3, K4, K5, KPW, KPW2
Quote:
Originally Posted by eureka View Post
OK, good plan. I'm fine with assigned task and converting from Uncode escaped sequences to UTF-8 looks not so hard with Python, so I'll do it.

It would be better if extract part will also produce UTF-8 output. While practically it is superfluous (as only en_US resources will be taken from extract result and these resources are the same in ISO-8859-1 and UTF-8 variants), it will be more consistent and more error-prone in case of my (or someone else's) error, if localized resource will leak from extract result to Git repo and further to Transifex.
I was experimenting with the .properties stuff in Java and found out something really nice: if using the stream-based approach (load(InputStream)/store(OutputStream)), things are ISO-8859-1. However, if using the (reader/writer)-based approach (e.g. load(Reader)/store(Writer)), things are UTF-8. This means that I only have to change a few lines in the code. More importantly, it is almost trivial (20 lines or so) to write something which converts files. So I'll integrate that into the program, and you don't have to waste time doing it.
ixtab is offline   Reply With Quote
Old 01-08-2012, 01:55 PM   #88
ixtab
(offline)
ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.
 
ixtab's Avatar
 
Posts: 2,907
Karma: 6736092
Join Date: Dec 2011
Device: K3, K4, K5, KPW, KPW2
Current status:
- "extract" now produces UTF-8 files.
- "compile" now expects UTF-8 files.
- A new mode named "iso2utf" has been added to the tool. This was quickly written up after the previous discussions (Well, it's 50 LOC, not 20, but still...) This mode can be used for converting existing localizations. Sample usage is "java -jar kt-l10n.jar iso2utf -f -s com/ -t com/". This would convert all .properties files inside com/ from ISO-8859-1 to UTF-8. Source and target directories must be the same; in other words, files are updated in place. DO NOT run this more than once on the same directory, or the output WILL become bullshit. (It's converted from ISO-8559-1 to UTF-8 on the first run; it would be converted to something meaningless on the second run).

The current version has been checked in. I've tested the entire round-trip with the german translation, and it works for me. I also tested with some dummy russian file, and it seems to work as well. (try it... "pаз, два, тpи, четыpе", "блядь... не делает" ;-) ). So we should be ok to move to UTF-8.
ixtab is offline   Reply With Quote
Old 01-08-2012, 02:45 PM   #89
eureka
but forgot what it's like
eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.eureka ought to be getting tired of karma fortunes by now.
 
Posts: 741
Karma: 2345678
Join Date: Dec 2011
Location: north (by northwest)
Device: Kindle Touch
Quote:
Originally Posted by ixtab View Post
Current status:
- "extract" now produces UTF-8 files.
- "compile" now expects UTF-8 files.
- A new mode named "iso2utf" has been added to the tool. This was quickly written up after the previous discussions (Well, it's 50 LOC, not 20, but still...) This mode can be used for converting existing localizations. Sample usage is "java -jar kt-l10n.jar iso2utf -f -s com/ -t com/". This would convert all .properties files inside com/ from ISO-8859-1 to UTF-8. Source and target directories must be the same; in other words, files are updated in place. DO NOT run this more than once on the same directory, or the output WILL become bullshit. (It's converted from ISO-8559-1 to UTF-8 on the first run; it would be converted to something meaningless on the second run).

The current version has been checked in. I've tested the entire round-trip with the german translation, and it works for me. I also tested with some dummy russian file, and it seems to work as well. (try it... "pаз, два, тpи, четыpе", "блядь... не делает" ;-) ). So we should be ok to move to UTF-8.
Thanks, I'll make use of it.

(Your last Russian phrase is so much funny because one of the words is obscene [it's generally used for expressing of extreme frustration] and unexpected here and phrase doesn't fit in context of your whole message. I'm wondering what it was originally before translating...)
eureka is offline   Reply With Quote
Old 01-08-2012, 03:12 PM   #90
ixtab
(offline)
ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.ixtab ought to be getting tired of karma fortunes by now.
 
ixtab's Avatar
 
Posts: 2,907
Karma: 6736092
Join Date: Dec 2011
Device: K3, K4, K5, KPW, KPW2
Quote:
Originally Posted by eureka View Post
Thanks, I'll make use of it.

(Your last Russian phrase is so much funny because one of the words is obscene [it's generally used for expressing of extreme frustration] and unexpected here and phrase doesn't fit in context of your whole message. I'm wondering what it was originally before translating...)
hehe... glad you enjoyed it, I inserted it just for you ! There was no original, this one was made up by myself. Too much effort to get out the russian keyboard, but I guess you'll understand it anyway: "Ya nyemnoshko ponyemayu russkiy yazik. Govoryu ploxo, nu mogu tshitach i ponyemach" :-)
ixtab is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Kindle 3 localization JirkaS Kindle Developer's Corner 287 05-20-2018 10:08 AM
[K3] Physical keyboard localization Sir Alex Kindle Developer's Corner 112 05-19-2018 11:23 PM
Kindle 4 (no touch) GUI Localization Sir Alex Kindle Developer's Corner 43 09-13-2013 07:19 AM
Keyboard localization (hack) Sir Alex Kindle Developer's Corner 72 04-16-2013 03:05 PM
Kindle 3, Nook Simple Touch, Kobo Touch and Libra Pro Touch jbcohen Which one should I buy? 4 06-18-2011 07:58 PM


All times are GMT -4. The time now is 12:35 AM.


MobileRead.com is a privately owned, operated and funded community.