View Single Post
Old 06-24-2016, 08:42 PM   #1
Hopkins
Enthusiast
Hopkins began at the beginning.
 
Posts: 38
Karma: 10
Join Date: Jun 2016
Location: Minnesota USA
Device: Amazon Paperwhite 3G
[Editor Plugin] Traditional<->Simplified Chinese Convertor

Currently, the Chinese language is written with two different standardized character sets. The Chinese mainland and Singapore officially use the simplified set while other areas (such as Taiwan and Hong Kong) continue to largely use the traditional set. This plugin will allow users to convert EPUB and AZW3 files between both formats.

If only text format changes are desired (such as flow direction or quotation mark types), character set changes can be omitted. This allows changes to non-Chinese texts such as Japanese.

Main Features
  • Convert eBooks written in traditional characters into simplified characters
  • Convert eBooks written in simplified characters into traditional characters
  • Convert regional words and idioms used in the source material to those words and idioms used in the destination material
  • Convert individual sections or the entire book
  • Update metadata and table of contents
  • Convert text direction to vertical or horizontal
  • Provides command line processing for batch operations
  • This is an editor plugin so users can make changes in case the conversion is not perfect. Conversions from simplified to traditional should always be proofread.

Testing Platforms
  • Windows 10 (64 bit) - Calibre version 6.10

Note:
Github repository link

Command Line Interface(CLI)
Details:

Spoiler:
Example: overwrite all ebook files in a directory from Mainland simplified into Taiwan traditional (also change to East Asian quote marks and vertical text orientation) add in a "V" suffix to the file name and optimizing for display on the Chrome Readium reader:
calibre-debug --run-plugin "Chinese Text Conversion" -- -ol tw -il cn -d s2t -od out -qt e -td v -up -a V *.epub *.azw3
Example: overwrite all epub files in a directory from Taiwan traditional into Mainland simplified, but don't actually perform the write. Just print what would happen:
calibre-debug --run-plugin "Chinese Text Conversion" -- -ol cn -il tw -d t2s -t my_chinese_epub_dir/*.epub
Code:
usage: calibre-debug.exe [-h] [-il {cn,hk,tw,jp}] [-ol {cn,hk,tw,jp}] [-d {t2s,s2t,t2t,none}] [-p]
                         [-qt {w,e,no_change}] [-td {h,v,no_change}] [-up] [-v] [-t] [-q] [-od OUTDIR_OPT]
                         [-a APPEND_SUFFIX_OPT] [-f] [-s]
                         ebook-filepath [ebook-filepath ...]

Convert Chinese characters between traditional/simplified types and/or change text style. Generally run as: calibre-
debug --run-plugin "Chinese Text Conversion" -- [options] ebook-filepath Plugin Version: 3.0.0

positional arguments:
  ebook-filepath        One or more epub and/or azw3 ebook filepaths - UNIX style wildcards accepted

options:
  -h, --help            show this help message and exit
  -il {cn,hk,tw,jp}, --input-locale {cn,hk,tw,jp}
                        Set to the ebook origin locale if known (Default: cn)
  -ol {cn,hk,tw,jp}, --output-locale {cn,hk,tw,jp}
                        Set to the ebook target locale (Default: cn)
  -d {t2s,s2t,t2t,none}, --direction {t2s,s2t,t2t,none}
                        Set to the ebook conversion direction (Default: none)
  -p, --phrase_convert  Convert phrases to target locale versions (Default: False)
  -qt {w,e,no_change}, --quotation-type {w,e,no_change}
                        Set to Western or East Asian (Default: no_change)
  -td {h,v,no_change}, --text-direction {h,v,no_change}
                        Set to the ebook origin locale if known (Default: no_change)
  -up, --update_punctuation
                        Update punctuation to match direction change (Default: False)
  -v, --verbose         Print out details as the conversion progresses (Default: False)
  -t, --test            Run conversion operations without saving results (Default: False)
  -q, --quiet           Do not print anything, ignore warnings - this option overrides the -s option (Default: False)
  -od OUTDIR_OPT, --output-dir OUTDIR_OPT
                        Set to the ebook output file directory (Default: overwrite existing ebook file)
  -a APPEND_SUFFIX_OPT, --append_suffix APPEND_SUFFIX_OPT
                        Append a suffix to the output file basename (Default: )
  -f, --force           Force processing by ignoring warnings (e.g. allow overwriting files with no prompt)
  -s, --show            Show the settings based on user cmdline options and exit (Default: False)


Installation Steps:
Download the attached zip file and install the plugin/add to context menu or toolbar/restart Calibre as described in the Introduction to plugins .

Operation:
From the main Calibre window, select a book and then press the "Edit book" icon on the toolbar. The editor will open. Press the "plugins" text on the editor toolbar and select the plugin.

Special Notes:
  • Requires calibre v6.0 or higher
  • No testing has been done on OS X systems
  • Keep a copy of the original file. Round trip conversions (i.e. traditional->simplified->traditional) will probably not recover the original version. Also, since characters are being replaced, it's possible the font in your eBook reader may not have all the necessary glyphs
  • Metadata changes made via the GUI do not update the main Calibre database. They will be overwritten once the editor is re-opened. Consider using the 'Save a copy' option
  • Calibre Version 5.0 and later support the reading of vertical text. Earlier versions did not.

Version History:
Spoiler:
  • Version 1.0.0 - 24 Jun 2016. Initial release
  • Version 1.1.0 - 27 Jun 2016. Improved speed
  • Version 1.2.0 - 29 Jun 2016. Correct conversion, turn on compression for the plugin zip file
  • Version 2.0.0 - 10 Nov 2016. Added command line processing, now also update TOC and metadata, updated conversion dictionaries
  • Version 2.0.1 - 24 Jan 2016. Updated conversion dictionaries to latest at OpenCC project. Modified using chihchun's changes to allow plugin to work with more Calibre versions. Corrected minimum version.
  • Version 2.1.0 - 19 Feb 2017. Added option to also convert quotation mark style to match target. Not yet added to the command line version.
  • Version 2.1.1 - 5 Aug 2017. Correct exception that occurred when processing an entire book. See Github issue #3 for details.
  • Version 2.2.1 - 22 Aug 2017 - Added vertical text orientation and epub quotation mark optimization for Readium and Kindle viewers. Kindle Previewer 3 must be used to convert epub files into Kindle mobi files.
  • Version 2.2.2 - 30 Aug 2017 - Corrected CSS file for EPUB->AZW3 conversion. See Github issue #5 for details.
  • Version 2.2.3 - 31 Aug 2017 - Improved speed for vertical text conversion optimization. See Github issue #5 for details.
  • Version 2.2.4 - 15 Sep 2017 - Fix some CSS issues with vertical text. See Github issue #5 for details.
  • Version 2.3.0 - 4 Oct 2017 - Add full support for AZW3 files. See Gihub issue #6 for details
  • Version 2.3.1 - 13 Nov 2017 - Allow the settings dialog to resize by adding scroll bars.
  • Version 2.3.2 - 24 Nov 2018 - Improve conversion speed. Default to convert entire book
  • Version 2.3.3 - 12 Jan 2019 - Switch from cssutils to css-parser to match Calibre 3.37 and later releases. People will need to update to Calibre 3.37.
  • Version 2.3.4 - 17 Apr 2019 - Bug fix to avoid error when an item does not have a title.
  • Version 2.4.0 - 25 Sep 2020 - Add Python 3 operation. Warning - Command line is not fully tested.
  • Version 3.0.0 - 30 Dec 2022 - Updated conversion dictionaries. Added hanzi to kanji conversion. Added ability to only convert a small section. Uses a completely new HTML parser.
  • Version 3.0.1 - 4 Jan 2023 - Fixed conversion error from mainland simplified to Taiwan traditional.
  • Version 3.0.2 - 27 Mar 2023 - Fixed processing of character references.
Attached Thumbnails
Click image for larger version

Name:	PluginDialogPicture.png
Views:	623
Size:	296.5 KB
ID:	198687   Click image for larger version

Name:	PluginConversionChinese.png
Views:	408
Size:	284.8 KB
ID:	198688   Click image for larger version

Name:	PluginConversionJapanese2.png
Views:	331
Size:	282.2 KB
ID:	198689  
Attached Files
File Type: zip TradSimpChinese_3_0_2.zip (471.6 KB, 19520 views)

Last edited by Hopkins; 03-27-2023 at 12:59 PM. Reason: Fix Version: 3.0.1
Hopkins is offline   Reply With Quote