Force transcription for Non-ASCII characters?

Kay_2 · 11-19-2015, 04:08 PM

Hey there!
I was wondering, if there's any way to force calibre to transcribe titles, authors and such a certain way. I'm not sure about other languages but with Chinese and Japanese it's a pretty sad result right now, because calibre resists any common transcription principles which results in nonsense.

An example:

The author 仲村佳樹 (read "Nakamura Yoshiki" or -as Asian names are always "the other way around "Yoshiki Nakamura") becomes Zhong Cun Jia Shu
The title スキップ・ビート第30巻 (read as "Skip Beat dai30wa") becomes sukitupubito Di 01Juan

While in this case one might be tempted to oversee the faux-pas because it's a "Jenglish" word, the issue becomes more pronounced upon using "normal" Japanese titles:

The title コイバナ！恋せよ花火第08巻 ("Koibana Koiseyo Hanabi") becomes "koibana! Lian seyoHua Huo Di 08Juan"
The author ななじ眺 (read "Nanaji Nagamu" or written in "Western Style" "Nagamu Nanaji") becomes "Nanazi Tiao"

While I was going to praise the system for actually transcribing Japanese Hiragana and Katakana the right way at least, that "zi" of "Nanazi" made me change my mind, because the transcription for じ is "ji" and not "zi".

But let's get to the really messed up part:Japanese Kanji or Chinese Hanzi.

Whenever there is a Kanji in a word, calibre will use (apparently) the first Chinese transcription on a list. No regards to any kind of language rules. I'd rather have it stored in the original language right away, if that means, I won't get some BS-named files out of it. Or - one might always hope- find a way to use a correct transcription.

Why, if it ignores the language metadata eitherway, doesn't calibre at least realize that, if the title uses one Japanese character, the rest must be Japanese too? That would at least fix the matter for any words that contain Hiragana, Katakana or Japanese-only Kanji... The tougher part would probably be those characters that have both, Japanese and a Chinese reading(s).

While it probably is a lot harder to make a script that reverses Chinese and Japanese characters into alphabet letters, there certainly must be some way because it works the other way around, if you install the IME.

When you type "wangzi" in Chinese or "ouji" in Japanese both times you will automatically get "王子". I'm no expert, but can't calibre use that IME-"intelligence" together with the language-field metadata to automatically create transciptions (I'm aware that it won't always work as for some symbols there are more options possible even if you stay in the range of one language instead of mixing in others, but it'd be a lot better than the bogus coming out right now...)?

If that is too complicated, is it possible to force the issue "by hand"?

Right now I always write the titles in their "native" language and put the transcription in brackets behind it, otherwise I'll have no chance in hell to find anything in the folders, if I cannot access the calibre interface and/or copy certain files (I use calibre to store scanned comic-rar-archives as well and because calibre doesn't always store the whole title -too long- I end up with files that have nothing to do with any language knowledgeable to mankind).

So right now I always write entries like this:
Title: コイバナ！恋せよ花火第08巻 [Koibana Koiseyo Hanabi] Author: ななじ眺 & Nagamu Nanaji

I thought about creating a new sorting systems with an additional column "transcription". Then make calibre but stuff under author_transcription -> series_transcription -> title_transcription or something like that, but halfway through the idea, I remembered that that would f*** with the rest of my "alphabet-friendly" library entries as well..Oh and system language doesn't change anything either there. Same problems.

Does anyone have an idea how to solve this issue? I know it's not exactly a grave problem, but if there's a solution-possibility, I'd like to try it.

theducks · 11-19-2015, 06:48 PM

I would assume

that you would use the books Language metadata to choose the translation, not the character set.

eschwartz · 11-19-2015, 09:00 PM

calibre automatically stores files within its own library using ASCII, for maximum compatibility across filesystems/OSes.
I don't think that will change.

But see:
Preferences ==> Import/Export ==> Saving books to disk
Uncheck "Convert non-English characters to English equivalents"

As for getting a better chinese/japanese-to-ascii mapping, I have no idea, sorry.

Kay_2 · 11-23-2015, 03:34 AM

Quote:

Originally Posted by theducks

I would assume

that you would use the books Language metadata to choose the translation, not the character set.

I'm sorry but I have no idea what you're talking about. Language metadata so far doesn't do anything for the way calibre stores files, nor am I trying to translate anything. Transcription ≠ translation for that matter to begin with. In this case it would probably even make more sense to call it "romanization" instead of simply "transcription". It is a "problem" unique to characters of non-Latin writing systems like Chinese, Japanese, Arabian (any language that utilizes another form of character sets to write) when trying to convert them to the Roman (Latin) script.

Quote:

Originally Posted by eschwartz

calibre automatically stores files within its own library using ASCII, for maximum compatibility across filesystems/OSes.
I don't think that will change.

But see:
Preferences ==> Import/Export ==> Saving books to disk
Uncheck "Convert non-English characters to English equivalents"

As for getting a better chinese/japanese-to-ascii mapping, I have no idea, sorry.

Thank you for that. Unfortunately it doesn't work for me (or maybe it does but only with an assortment of languages. The point is that you don't need transcription for, let's say, French, Spanish or German.Because you can find anything you're looking for by switching 'é' and 'è' or 'ß' characters for simple 'e's or 'ss'. Plus main parts of titles stay the same, even if a few letters aren't correct. Unfortunately this does not go for non-Latin characters-system languages. While some of these languages might have a handful of extra characters or stressing marks across letters, most Asian and Arabian languages for example, require an advanced form of transcription.
For now, I guess I'll have to stick to the brackets-transcription I'm doing at the moment. Maybe in time someone else might have a useful idea on how to proceed as this is a problem many people who work with different languages come across. Thank you for trying to help me though!

eschwartz · 11-23-2015, 09:32 AM

Well, I would think that that setting would cause the save-to-disk filename to use the unicode value of the metadata fields directly. That shouldn't be language dependent.

But I freely admit that I have never tried putting calibre through its tricks with another language. Sorry I wasn't able to help.

eschwartz · 11-23-2015, 09:38 AM

You could use the English everything in the main metadata columns, including transcriptions. calibre would use that to build its in-library filenames, and save-to-disk as well.

But for display purposes in calibre, use an "Original Japanese/Chinese" custom column with Just What It Says On The Tin. And use a custom column built from other columns to switch between them and display e.g. the "Real Title".
You will probably also need a metadata plugboard to replace the title with the "Real Title" in the metadata of exported books.

NeraSnow · 11-26-2015, 12:19 AM

Hmm, as a Chinese, I find it interesting for me. No matter what the Japanese name will be like(Names in Kanji or even in Hiragana or Katagana), we will all convert it to Chinese Hanzi. (Believe it or not, almost every Japanese word have a equivalent in Kanji. And there's no many differences between Kanji and Hanzi)
So that's why 仲村佳樹 becomes Zhong Cun Jia Shu. And actually it's quite convenient for us to read.
Hanzi or Kanji(Actually they are the same, they borrowed these characters) are so complicated that it's hard for developer to have a perfect transcription system. As you said, "ouji" and "wangzi" would be converted into 王子. How can computers tell in what kind of context 王子 equals ouji or wangzi.
But still, I couldn't provide any solution about your question. I am sorry about that. It seems that you know some Japanese. So maybe you can learn some Chinese

so that you can recognize the title. (Just kidding, but it's a chance to know a new language

)
I am a little nervous to type so many word in English. It's like a small essay.
If there's some awkward words that made you confused. I want to say sorry to you.

kovidgoyal · 11-26-2015, 12:31 AM

Unicode represents chinese/japanese/vietnamese/korean using the same range of codepoints, therefore there is no robust way to transliterate those codepoints for those languages. What calibre does is use your calibre interface language, defaulting to chinese for all languages other than japanese/vietnames/korean. The transliteration algorithm used depends on what language you have set in Preferences->Look & Feel->Choose language

And it is important to note that changing that language will not cause existing folders in the library to be renamed, it will only affect future books added to calibre. You can always force a rename of existing folders by using the search and replace tool in the bulk metadata edit dialog to add some temporary suffix to all titles and then remove it.

AColobus · 12-21-2015, 04:25 PM

Is there a way to rename existing folders for books already added to Calibre?

I only discovered this issue after adding a large number of Japanese books. It's now impossible to tell what any of them are. Renaming the folders in Windows just means Calibre can't find the book any more, but I couldn't see any options within Calibre to rename an existing folder, nor to change the path after renaming manually.

In case anyone suggests it - I can't simply remove the books and reimport them either, because I'm only able to access my Japanese ebook account when I'm actually in Japan. When I'm in Europe, I use my European account, and my Japanese books are wiped from the system. This is exactly why I use Calibre.

eschwartz · 12-21-2015, 05:07 PM

If you change the author/title metadata, e.g. via Bulk Edit Metadata, then the folder names will be recalculated. This sometimes fixes certain issues.

But you cannot tell calibre to use some other naming scheme, as per:
Sticky: Want to change the folder structure of the Calibre library?

AColobus · 12-21-2015, 07:11 PM

Thanks for the suggestion eschwartz. I just did a quick test.

My test author was Banana Yoshimoto (吉本ばなな) who is filed as [Ji Ben banana]. I tried just adding an s, closing the metadata screen to save the change, and then removing it. Unfortunately, even though my import preferences are now set to "do not transliterate", this simply renamed the folder to [Ji Ben bananas], then returned the contents to the original folder when the s was removed.

I then tried giving her a different name entirely, and then restoring her name. This had the same effect.

Finally I tried a weird hack to test this: I changed her name to Yoshimoto Banana (not currently a folder name used by any other book) [吉本 banana]. This put the book in [Yoshimoto Banana]. When I switched to the half-kanji half-romaji name, Calibre nevertheless decided this was equivalent to [Ji Ben banana] and put it back in the original folder.

So my observations:
* setting transliteration preferences has no effect when changing metadata of existing books, presumably because they work a different way? Or is this behaviour built in? What happens if you import a book with transliteration preferences set to retain original characters, then change the metadata - will it suddenly romanize it? I didn't test that because it's past midnight, and also (as noted) I don't want anything weird happening to my books because I can't restore them.
* romanization is liable to conflate authors who are actually different people. No idea how likely it is to actually happen, or whether this would present a problem for normal operation of Calibre.

eschwartz · 12-21-2015, 07:18 PM

See post #3.

And recalculating folder names will usually result in exactly the same thing again.
As Kovid said in post #8, the transliteration depends on your interface language, that is an example of a situation in which recalculating folder names is helpful (because the folder name was valid, but the rules for deriving it have changed).

AColobus · 12-22-2015, 01:51 PM

Quote:

Originally Posted by eschwartz

See post #3.

And recalculating folder names will usually result in exactly the same thing again.
As Kovid said in post #8, the transliteration depends on your interface language, that is an example of a situation in which recalculating folder names is helpful (because the folder name was valid, but the rules for deriving it have changed).

I'm sure you meant to be helpful, but the first sentence needs more context, sorry. I've read post #3 and now reread it, but still have no idea what you're getting at there.

I was a bit confused because my interface language is Japanese, but now realised Calibre interface is different from my computer interface language. Having changed that setting, I'm now getting [Yoshimoto banana] when I re-saved the name, so thanks, I think this is as solved as it can get.

eschwartz · 12-22-2015, 02:16 PM

Ah, merely that there is no way to avoid romanization, since the setting to disable it is only for Save-to-disk exports. I understood this:

Quote:

Originally Posted by AColobus

My test author was Banana Yoshimoto (吉本ばなな) who is filed as [Ji Ben banana]. I tried just adding an s, closing the metadata screen to save the change, and then removing it. Unfortunately, even though my import preferences are now set to "do not transliterate", this simply renamed the folder to [Ji Ben bananas], then returned the contents to the original folder when the s was removed.

to be trying to get native Japanese characters in the folder names in calibre.

And I guess since you managed to find an interface language which results in an acceptable romanization, all is good.

Make sure to keep calibre set to that interface language, at least whenever you add new books or change author/title metadata.

AColobus · 01-01-2016, 07:24 AM

Ah, thank you for the explanation

I was in fact originally hoping to get that - I didn't quite grasp that it might not be possible, because I use plenty of non-ASCII folder names so it's not something I'd thought about. I didn't realise "within its own library" meant folder names, I assumed it was referring to something in the Calibre software. But I worked it out as I went along, and correct romanizations are a definite step up.

On a conceptual level I feel it's a shame transcription is forced in this way, because it does lead to some problems. There's this bit of confusion, it'll still fall down on names with variant pronunciations, and it's going to be a bit of a pain combining Chinese and Japanese ebooks. It's also slightly awkward if you want/have to deal with folders rather than only touching them via Calibre. Ah well.

I'd probably have aimed to include a "transcription" field that lets you auto-generate a transcription or create your own. Those of us who are quite happy having non-ASCII folder names could keep them. But that might be a lot harder than I realise, and I appreciate the difficulties with trying to build robust software with maximum compatability. Just a bit of musing really

11-19-2015, 04:08 PM	#1
Kay_2 Junior Member Posts: 3 Karma: 10 Join Date: Feb 2015 Device: Win 8.1 / 7Pro, Android	Force transcription for Non-ASCII characters? Hey there! I was wondering, if there's any way to force calibre to transcribe titles, authors and such a certain way. I'm not sure about other languages but with Chinese and Japanese it's a pretty sad result right now, because calibre resists any common transcription principles which results in nonsense. An example: The author 仲村佳樹 (read "Nakamura Yoshiki" or -as Asian names are always "the other way around "Yoshiki Nakamura") becomes Zhong Cun Jia Shu The title スキップ・ビート第30巻 (read as "Skip Beat dai30wa") becomes sukitupubito Di 01Juan While in this case one might be tempted to oversee the faux-pas because it's a "Jenglish" word, the issue becomes more pronounced upon using "normal" Japanese titles: The title コイバナ！恋せよ花火第08巻 ("Koibana Koiseyo Hanabi") becomes "koibana! Lian seyoHua Huo Di 08Juan" The author ななじ眺 (read "Nanaji Nagamu" or written in "Western Style" "Nagamu Nanaji") becomes "Nanazi Tiao" While I was going to praise the system for actually transcribing Japanese Hiragana and Katakana the right way at least, that "zi" of "Nanazi" made me change my mind, because the transcription for じ is "ji" and not "zi". But let's get to the really messed up part:Japanese Kanji or Chinese Hanzi. Whenever there is a Kanji in a word, calibre will use (apparently) the first Chinese transcription on a list. No regards to any kind of language rules. I'd rather have it stored in the original language right away, if that means, I won't get some BS-named files out of it. Or - one might always hope- find a way to use a correct transcription. Why, if it ignores the language metadata eitherway, doesn't calibre at least realize that, if the title uses one Japanese character, the rest must be Japanese too? That would at least fix the matter for any words that contain Hiragana, Katakana or Japanese-only Kanji... The tougher part would probably be those characters that have both, Japanese and a Chinese reading(s). While it probably is a lot harder to make a script that reverses Chinese and Japanese characters into alphabet letters, there certainly must be some way because it works the other way around, if you install the IME. When you type "wangzi" in Chinese or "ouji" in Japanese both times you will automatically get "王子". I'm no expert, but can't calibre use that IME-"intelligence" together with the language-field metadata to automatically create transciptions (I'm aware that it won't always work as for some symbols there are more options possible even if you stay in the range of one language instead of mixing in others, but it'd be a lot better than the bogus coming out right now...)? If that is too complicated, is it possible to force the issue "by hand"? Right now I always write the titles in their "native" language and put the transcription in brackets behind it, otherwise I'll have no chance in hell to find anything in the folders, if I cannot access the calibre interface and/or copy certain files (I use calibre to store scanned comic-rar-archives as well and because calibre doesn't always store the whole title -too long- I end up with files that have nothing to do with any language knowledgeable to mankind). So right now I always write entries like this: Title: コイバナ！恋せよ花火第08巻 [Koibana Koiseyo Hanabi] Author: ななじ眺 & Nagamu Nanaji I thought about creating a new sorting systems with an additional column "transcription". Then make calibre but stuff under author_transcription -> series_transcription -> title_transcription or something like that, but halfway through the idea, I remembered that that would f*** with the rest of my "alphabet-friendly" library entries as well..Oh and system language doesn't change anything either there. Same problems. Does anyone have an idea how to solve this issue? I know it's not exactly a grave problem, but if there's a solution-possibility, I'd like to try it.

11-26-2015, 12:19 AM	#7
NeraSnow Junior Member Posts: 1 Karma: 10 Join Date: Nov 2015 Device: Kindle Paperwhite	Hmm, as a Chinese, I find it interesting for me. No matter what the Japanese name will be like(Names in Kanji or even in Hiragana or Katagana), we will all convert it to Chinese Hanzi. (Believe it or not, almost every Japanese word have a equivalent in Kanji. And there's no many differences between Kanji and Hanzi) So that's why 仲村佳樹 becomes Zhong Cun Jia Shu. And actually it's quite convenient for us to read. Hanzi or Kanji(Actually they are the same, they borrowed these characters) are so complicated that it's hard for developer to have a perfect transcription system. As you said, "ouji" and "wangzi" would be converted into 王子. How can computers tell in what kind of context 王子 equals ouji or wangzi. But still, I couldn't provide any solution about your question. I am sorry about that. It seems that you know some Japanese. So maybe you can learn some Chinese so that you can recognize the title. (Just kidding, but it's a chance to know a new language) I am a little nervous to type so many word in English. It's like a small essay. If there's some awkward words that made you confused. I want to say sorry to you.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
How to allow extended ASCII German characters in my EPUB?	crankypants	ePub	8	11-10-2015 04:04 PM
Extended ASCII characters in txt file	pshute	Conversion	10	02-28-2012 06:57 AM
Non-ASCII characters in recipe titles show as Ã¼	bubak	Recipes	2	11-30-2011 07:49 AM
Converting non-ASCII characters	davidnye	Recipes	0	08-20-2011 07:16 PM
Typing non-ASCII characters with the keyboard	Edmundo	Amazon Kindle	5	01-20-2011 01:18 PM

11-19-2015, 06:48 PM	#2
theducks Well trained by Cats Posts: 31,880 Karma: 64184592 Join Date: Aug 2009 Location: The Central Coast of California Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A	I would assume that you would use the books Language metadata to choose the translation, not the character set.

11-19-2015, 09:00 PM	#3
eschwartz Ex-Helpdesk Junkie Posts: 19,421 Karma: 85400180 Join Date: Nov 2012 Location: The Beaten Path, USA, Roundworld, This Side of Infinity Device: Kindle Touch fw5.3.7 (Wifi only)	calibre automatically stores files within its own library using ASCII, for maximum compatibility across filesystems/OSes. I don't think that will change. But see: Preferences ==> Import/Export ==> Saving books to disk Uncheck "Convert non-English characters to English equivalents" As for getting a better chinese/japanese-to-ascii mapping, I have no idea, sorry. Last edited by eschwartz; 11-19-2015 at 09:02 PM.

11-23-2015, 09:32 AM	#5
eschwartz Ex-Helpdesk Junkie Posts: 19,421 Karma: 85400180 Join Date: Nov 2012 Location: The Beaten Path, USA, Roundworld, This Side of Infinity Device: Kindle Touch fw5.3.7 (Wifi only)	Well, I would think that that setting would cause the save-to-disk filename to use the unicode value of the metadata fields directly. That shouldn't be language dependent. But I freely admit that I have never tried putting calibre through its tricks with another language. Sorry I wasn't able to help.

11-23-2015, 09:38 AM	#6
eschwartz Ex-Helpdesk Junkie Posts: 19,421 Karma: 85400180 Join Date: Nov 2012 Location: The Beaten Path, USA, Roundworld, This Side of Infinity Device: Kindle Touch fw5.3.7 (Wifi only)	You could use the English everything in the main metadata columns, including transcriptions. calibre would use that to build its in-library filenames, and save-to-disk as well. But for display purposes in calibre, use an "Original Japanese/Chinese" custom column with Just What It Says On The Tin. And use a custom column built from other columns to switch between them and display e.g. the "Real Title". You will probably also need a metadata plugboard to replace the title with the "Real Title" in the metadata of exported books.

11-26-2015, 12:31 AM	#8
kovidgoyal creator of calibre Posts: 46,372 Karma: 29630884 Join Date: Oct 2006 Location: Mumbai, India Device: Various	Unicode represents chinese/japanese/vietnamese/korean using the same range of codepoints, therefore there is no robust way to transliterate those codepoints for those languages. What calibre does is use your calibre interface language, defaulting to chinese for all languages other than japanese/vietnames/korean. The transliteration algorithm used depends on what language you have set in Preferences->Look & Feel->Choose language And it is important to note that changing that language will not cause existing folders in the library to be renamed, it will only affect future books added to calibre. You can always force a rename of existing folders by using the search and replace tool in the bulk metadata edit dialog to add some temporary suffix to all titles and then remove it.

12-21-2015, 04:25 PM	#9
AColobus Enthusiast Posts: 28 Karma: 71334 Join Date: Dec 2014 Device: Kobo Clara HD	Is there a way to rename existing folders for books already added to Calibre? I only discovered this issue after adding a large number of Japanese books. It's now impossible to tell what any of them are. Renaming the folders in Windows just means Calibre can't find the book any more, but I couldn't see any options within Calibre to rename an existing folder, nor to change the path after renaming manually. In case anyone suggests it - I can't simply remove the books and reimport them either, because I'm only able to access my Japanese ebook account when I'm actually in Japan. When I'm in Europe, I use my European account, and my Japanese books are wiped from the system. This is exactly why I use Calibre.

12-21-2015, 05:07 PM	#10
eschwartz Ex-Helpdesk Junkie Posts: 19,421 Karma: 85400180 Join Date: Nov 2012 Location: The Beaten Path, USA, Roundworld, This Side of Infinity Device: Kindle Touch fw5.3.7 (Wifi only)	If you change the author/title metadata, e.g. via Bulk Edit Metadata, then the folder names will be recalculated. This sometimes fixes certain issues. But you cannot tell calibre to use some other naming scheme, as per: Sticky: Want to change the folder structure of the Calibre library?

12-21-2015, 07:11 PM	#11
AColobus Enthusiast Posts: 28 Karma: 71334 Join Date: Dec 2014 Device: Kobo Clara HD	Thanks for the suggestion eschwartz. I just did a quick test. My test author was Banana Yoshimoto (吉本ばなな) who is filed as [Ji Ben banana]. I tried just adding an s, closing the metadata screen to save the change, and then removing it. Unfortunately, even though my import preferences are now set to "do not transliterate", this simply renamed the folder to [Ji Ben bananas], then returned the contents to the original folder when the s was removed. I then tried giving her a different name entirely, and then restoring her name. This had the same effect. Finally I tried a weird hack to test this: I changed her name to Yoshimoto Banana (not currently a folder name used by any other book) [吉本 banana]. This put the book in [Yoshimoto Banana]. When I switched to the half-kanji half-romaji name, Calibre nevertheless decided this was equivalent to [Ji Ben banana] and put it back in the original folder. So my observations: * setting transliteration preferences has no effect when changing metadata of existing books, presumably because they work a different way? Or is this behaviour built in? What happens if you import a book with transliteration preferences set to retain original characters, then change the metadata - will it suddenly romanize it? I didn't test that because it's past midnight, and also (as noted) I don't want anything weird happening to my books because I can't restore them. * romanization is liable to conflate authors who are actually different people. No idea how likely it is to actually happen, or whether this would present a problem for normal operation of Calibre.

12-21-2015, 07:18 PM	#12
eschwartz Ex-Helpdesk Junkie Posts: 19,421 Karma: 85400180 Join Date: Nov 2012 Location: The Beaten Path, USA, Roundworld, This Side of Infinity Device: Kindle Touch fw5.3.7 (Wifi only)	See post #3. And recalculating folder names will usually result in exactly the same thing again. As Kovid said in post #8, the transliteration depends on your interface language, that is an example of a situation in which recalculating folder names is helpful (because the folder name was valid, but the rules for deriving it have changed).

01-01-2016, 07:24 AM	#15
AColobus Enthusiast Posts: 28 Karma: 71334 Join Date: Dec 2014 Device: Kobo Clara HD	Ah, thank you for the explanation I was in fact originally hoping to get that - I didn't quite grasp that it might not be possible, because I use plenty of non-ASCII folder names so it's not something I'd thought about. I didn't realise "within its own library" meant folder names, I assumed it was referring to something in the Calibre software. But I worked it out as I went along, and correct romanizations are a definite step up. On a conceptual level I feel it's a shame transcription is forced in this way, because it does lead to some problems. There's this bit of confusion, it'll still fall down on names with variant pronunciations, and it's going to be a bit of a pain combining Chinese and Japanese ebooks. It's also slightly awkward if you want/have to deal with folders rather than only touching them via Calibre. Ah well. I'd probably have aimed to include a "transcription" field that lets you auto-generate a transcription or create your own. Those of us who are quite happy having non-ASCII folder names could keep them. But that might be a lot harder than I realise, and I appreciate the difficulties with trying to build robust software with maximum compatability. Just a bit of musing really