![]() |
#1 |
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Feb 2015
Device: Win 8.1 / 7Pro, Android
|
Force transcription for Non-ASCII characters?
Hey there!
I was wondering, if there's any way to force calibre to transcribe titles, authors and such a certain way. I'm not sure about other languages but with Chinese and Japanese it's a pretty sad result right now, because calibre resists any common transcription principles which results in nonsense. An example: The author 仲村佳樹 (read "Nakamura Yoshiki" or -as Asian names are always "the other way around "Yoshiki Nakamura") becomes Zhong Cun Jia Shu The title スキップ・ビート 第30巻 (read as "Skip Beat dai30wa") becomes sukitupubito Di 01Juan While in this case one might be tempted to oversee the faux-pas because it's a "Jenglish" word, the issue becomes more pronounced upon using "normal" Japanese titles: The title コイバナ! 恋せよ花火 第08巻 ("Koibana Koiseyo Hanabi") becomes "koibana! Lian seyoHua Huo Di 08Juan" The author ななじ眺 (read "Nanaji Nagamu" or written in "Western Style" "Nagamu Nanaji") becomes "Nanazi Tiao" While I was going to praise the system for actually transcribing Japanese Hiragana and Katakana the right way at least, that "zi" of "Nanazi" made me change my mind, because the transcription for じ is "ji" and not "zi". But let's get to the really messed up part:Japanese Kanji or Chinese Hanzi. Whenever there is a Kanji in a word, calibre will use (apparently) the first Chinese transcription on a list. No regards to any kind of language rules. I'd rather have it stored in the original language right away, if that means, I won't get some BS-named files out of it. Or - one might always hope- find a way to use a correct transcription. Why, if it ignores the language metadata eitherway, doesn't calibre at least realize that, if the title uses one Japanese character, the rest must be Japanese too? That would at least fix the matter for any words that contain Hiragana, Katakana or Japanese-only Kanji... The tougher part would probably be those characters that have both, Japanese and a Chinese reading(s). While it probably is a lot harder to make a script that reverses Chinese and Japanese characters into alphabet letters, there certainly must be some way because it works the other way around, if you install the IME. When you type "wangzi" in Chinese or "ouji" in Japanese both times you will automatically get "王子". I'm no expert, but can't calibre use that IME-"intelligence" together with the language-field metadata to automatically create transciptions (I'm aware that it won't always work as for some symbols there are more options possible even if you stay in the range of one language instead of mixing in others, but it'd be a lot better than the bogus coming out right now...)? If that is too complicated, is it possible to force the issue "by hand"? Right now I always write the titles in their "native" language and put the transcription in brackets behind it, otherwise I'll have no chance in hell to find anything in the folders, if I cannot access the calibre interface and/or copy certain files (I use calibre to store scanned comic-rar-archives as well and because calibre doesn't always store the whole title -too long- I end up with files that have nothing to do with any language knowledgeable to mankind). So right now I always write entries like this: Title: コイバナ! 恋せよ花火 第08巻 [Koibana Koiseyo Hanabi] Author: ななじ眺 & Nagamu Nanaji I thought about creating a new sorting systems with an additional column "transcription". Then make calibre but stuff under author_transcription -> series_transcription -> title_transcription or something like that, but halfway through the idea, I remembered that that would f*** with the rest of my "alphabet-friendly" library entries as well..Oh and system language doesn't change anything either there. Same problems. Does anyone have an idea how to solve this issue? I know it's not exactly a grave problem, but if there's a solution-possibility, I'd like to try it. ![]() |
![]() |
![]() |
![]() |
#2 |
Well trained by Cats
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 30,883
Karma: 59840450
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
|
I would assume
![]() |
![]() |
![]() |
![]() |
#3 |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
calibre automatically stores files within its own library using ASCII, for maximum compatibility across filesystems/OSes.
I don't think that will change. ![]() But see: Preferences ==> Import/Export ==> Saving books to disk Uncheck "Convert non-English characters to English equivalents" As for getting a better chinese/japanese-to-ascii mapping, I have no idea, sorry. Last edited by eschwartz; 11-19-2015 at 09:02 PM. |
![]() |
![]() |
![]() |
#4 | ||
Junior Member
![]() Posts: 3
Karma: 10
Join Date: Feb 2015
Device: Win 8.1 / 7Pro, Android
|
Quote:
Quote:
For now, I guess I'll have to stick to the brackets-transcription I'm doing at the moment. Maybe in time someone else might have a useful idea on how to proceed as this is a problem many people who work with different languages come across. Thank you for trying to help me though! ![]() |
||
![]() |
![]() |
![]() |
#5 |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Well, I would think that that setting would cause the save-to-disk filename to use the unicode value of the metadata fields directly. That shouldn't be language dependent.
But I freely admit that I have never tried putting calibre through its tricks with another language. Sorry I wasn't able to help. |
![]() |
![]() |
![]() |
#6 |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
You could use the English everything in the main metadata columns, including transcriptions. calibre would use that to build its in-library filenames, and save-to-disk as well.
But for display purposes in calibre, use an "Original Japanese/Chinese" custom column with Just What It Says On The Tin. And use a custom column built from other columns to switch between them and display e.g. the "Real Title". You will probably also need a metadata plugboard to replace the title with the "Real Title" in the metadata of exported books. |
![]() |
![]() |
![]() |
#7 |
Junior Member
![]() Posts: 1
Karma: 10
Join Date: Nov 2015
Device: Kindle Paperwhite
|
Hmm, as a Chinese, I find it interesting for me. No matter what the Japanese name will be like(Names in Kanji or even in Hiragana or Katagana), we will all convert it to Chinese Hanzi. (Believe it or not, almost every Japanese word have a equivalent in Kanji. And there's no many differences between Kanji and Hanzi)
So that's why 仲村佳樹 becomes Zhong Cun Jia Shu. And actually it's quite convenient for us to read. Hanzi or Kanji(Actually they are the same, they borrowed these characters) are so complicated that it's hard for developer to have a perfect transcription system. As you said, "ouji" and "wangzi" would be converted into 王子. How can computers tell in what kind of context 王子 equals ouji or wangzi. But still, I couldn't provide any solution about your question. I am sorry about that. It seems that you know some Japanese. So maybe you can learn some Chinese ![]() ![]() I am a little nervous to type so many word in English. It's like a small essay. If there's some awkward words that made you confused. I want to say sorry to you. |
![]() |
![]() |
![]() |
#8 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,149
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Unicode represents chinese/japanese/vietnamese/korean using the same range of codepoints, therefore there is no robust way to transliterate those codepoints for those languages. What calibre does is use your calibre interface language, defaulting to chinese for all languages other than japanese/vietnames/korean. The transliteration algorithm used depends on what language you have set in Preferences->Look & Feel->Choose language
And it is important to note that changing that language will not cause existing folders in the library to be renamed, it will only affect future books added to calibre. You can always force a rename of existing folders by using the search and replace tool in the bulk metadata edit dialog to add some temporary suffix to all titles and then remove it. |
![]() |
![]() |
![]() |
#9 |
Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 25
Karma: 71334
Join Date: Dec 2014
Device: Kobo Touch
|
Is there a way to rename existing folders for books already added to Calibre?
I only discovered this issue after adding a large number of Japanese books. It's now impossible to tell what any of them are. Renaming the folders in Windows just means Calibre can't find the book any more, but I couldn't see any options within Calibre to rename an existing folder, nor to change the path after renaming manually. In case anyone suggests it - I can't simply remove the books and reimport them either, because I'm only able to access my Japanese ebook account when I'm actually in Japan. When I'm in Europe, I use my European account, and my Japanese books are wiped from the system. This is exactly why I use Calibre. |
![]() |
![]() |
![]() |
#10 |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
If you change the author/title metadata, e.g. via Bulk Edit Metadata, then the folder names will be recalculated. This sometimes fixes certain issues.
But you cannot tell calibre to use some other naming scheme, as per: Sticky: Want to change the folder structure of the Calibre library? |
![]() |
![]() |
![]() |
#11 |
Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 25
Karma: 71334
Join Date: Dec 2014
Device: Kobo Touch
|
Thanks for the suggestion eschwartz. I just did a quick test.
My test author was Banana Yoshimoto (吉本 ばなな) who is filed as [Ji Ben banana]. I tried just adding an s, closing the metadata screen to save the change, and then removing it. Unfortunately, even though my import preferences are now set to "do not transliterate", this simply renamed the folder to [Ji Ben bananas], then returned the contents to the original folder when the s was removed. I then tried giving her a different name entirely, and then restoring her name. This had the same effect. Finally I tried a weird hack to test this: I changed her name to Yoshimoto Banana (not currently a folder name used by any other book) [吉本 banana]. This put the book in [Yoshimoto Banana]. When I switched to the half-kanji half-romaji name, Calibre nevertheless decided this was equivalent to [Ji Ben banana] and put it back in the original folder. So my observations: * setting transliteration preferences has no effect when changing metadata of existing books, presumably because they work a different way? Or is this behaviour built in? What happens if you import a book with transliteration preferences set to retain original characters, then change the metadata - will it suddenly romanize it? I didn't test that because it's past midnight, and also (as noted) I don't want anything weird happening to my books because I can't restore them. * romanization is liable to conflate authors who are actually different people. No idea how likely it is to actually happen, or whether this would present a problem for normal operation of Calibre. |
![]() |
![]() |
![]() |
#12 |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
See post #3.
And recalculating folder names will usually result in exactly the same thing again. As Kovid said in post #8, the transliteration depends on your interface language, that is an example of a situation in which recalculating folder names is helpful (because the folder name was valid, but the rules for deriving it have changed). |
![]() |
![]() |
![]() |
#13 | |
Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 25
Karma: 71334
Join Date: Dec 2014
Device: Kobo Touch
|
Quote:
I was a bit confused because my interface language is Japanese, but now realised Calibre interface is different from my computer interface language. Having changed that setting, I'm now getting [Yoshimoto banana] when I re-saved the name, so thanks, I think this is as solved as it can get. |
|
![]() |
![]() |
![]() |
#14 | |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Ah, merely that there is no way to avoid romanization, since the setting to disable it is only for Save-to-disk exports. I understood this:
Quote:
And I guess since you managed to find an interface language which results in an acceptable romanization, all is good. ![]() Make sure to keep calibre set to that interface language, at least whenever you add new books or change author/title metadata. |
|
![]() |
![]() |
![]() |
#15 |
Enthusiast
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 25
Karma: 71334
Join Date: Dec 2014
Device: Kobo Touch
|
Ah, thank you for the explanation
![]() I was in fact originally hoping to get that - I didn't quite grasp that it might not be possible, because I use plenty of non-ASCII folder names so it's not something I'd thought about. I didn't realise "within its own library" meant folder names, I assumed it was referring to something in the Calibre software. But I worked it out as I went along, and correct romanizations are a definite step up. On a conceptual level I feel it's a shame transcription is forced in this way, because it does lead to some problems. There's this bit of confusion, it'll still fall down on names with variant pronunciations, and it's going to be a bit of a pain combining Chinese and Japanese ebooks. It's also slightly awkward if you want/have to deal with folders rather than only touching them via Calibre. Ah well. I'd probably have aimed to include a "transcription" field that lets you auto-generate a transcription or create your own. Those of us who are quite happy having non-ASCII folder names could keep them. But that might be a lot harder than I realise, and I appreciate the difficulties with trying to build robust software with maximum compatability. Just a bit of musing really ![]() |
![]() |
![]() |
![]() |
Tags |
chinese characters, japanese characters, language metadata, transcription |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
How to allow extended ASCII German characters in my EPUB? | crankypants | ePub | 8 | 11-10-2015 04:04 PM |
Extended ASCII characters in txt file | pshute | Conversion | 10 | 02-28-2012 06:57 AM |
Non-ASCII characters in recipe titles show as ü | bubak | Recipes | 2 | 11-30-2011 07:49 AM |
Converting non-ASCII characters | davidnye | Recipes | 0 | 08-20-2011 07:16 PM |
Typing non-ASCII characters with the keyboard | Edmundo | Amazon Kindle | 5 | 01-20-2011 01:18 PM |