View Single Post
Old 03-18-2013, 03:29 AM   #1
SauliusP.
Plugin developer
SauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notesSauliusP. can name that song in three notes
 
SauliusP.'s Avatar
 
Posts: 108
Karma: 24394
Join Date: Feb 2012
Location: Lithuania
Device: Kindle
[GUI Plugin] Hyphenate This!

Hyphenate This! will add soft hyphens to your ebook and add even better feel of a real book!

Supports EPUB and AZW3 formats (no MOBI, even with KF8 inside, convert instead).

If you have a Kindle (with AZW3/KF8 support) this plugin will explode the book, add soft-hyphens and rebuild it back.

Use hyphenation dictionaries from Apache OpenOffice Extensions. Download one and add to the plugin via its settings.

This plugin is primarily targeted for Kindle users reading AZW3/KF8 format books, as Kindle does not support hyphenation itself. However, recent firmwares added support of soft-hyphenation. So if the book is pre-hyphenated, Kindle will display it correctly. Text search and other features remain. Note, that if you hyphenate EPUB or AZW3 and convert to "old" MOBI, hyphenation won't work.

Some EPUB readers have native hyphenation, but if you read some exotic language (like me), hyphenation support might be poor or not present at all. Luckily, Libre/Open Office dictionaries are implemented for quite a lot of languages.

Note. Not all EPUB readers support soft-hyphens in the way expected. Some hyphenate, but do not show dashes. Some display correctly, but lack search feature. So try it yourself and decide if it is any good. As per discussion further in this thread:
  • Sony devices seem to split text on soft-hyphens, but do not display dashes. Not acceptable for Sony users.
  • Kobo seems to display hyphenation correctly, but text search is ruined.

CAUTION! In versions before (and including) 0.9.26 of Calibre there is a flaw with AZW3 explosion/rebuilding. You might loose picture content. So please back-up your AZW3, if it is the only and original version of the book you have!

CAUTION! In versions before 0.9.24 of Calibre there is a flaw in support of AZW3 explode and rebuild workflow. TOC might be corrupted as well as quick jump through chapters! Might not be the case for you, but be warned!

Illustrations. I have added screenshots from my Kindle with English book. However, English is quite compact and hyphenation does not show all its beauty. So I've also added two screenshots with Lithuaniant text, where hyphenation is more obvious. Of course, text will look like wingdings to most of you, but just try too see the difference :-)
Spoiler:

English text, original:


English text, soft-hyphenated:


Lithuanian text, original:


Lithuanian text, soft-hyphenated:




User Guide
Spoiler:

Install plugin and download "OXT" dictionaries from link above. Open plugin's settings via menu and add those dictionaries. After dictionaries are added to the plugin, downloaded files are removed, plugin stores hyphenation information inside it's settings directory.

NOTE. You may also add hyphenation dictionary directly, i.e. appropriate "DIC" file, extracted from "OXT" (OXT is simly a ZIP file). "DIC" file must be named "hyph_<language code>.dic". E.g. "hyph_en_US.dic" or "hyph_ru.dic".

NOTE 2. I have tested lots of "OXT" dictionaries. Surprisingly, some of them include hyphenation file, but it is not included in the descriptor (plugin uses descriptor to find out the hyphenation dictionary inside "OXT" archive). So if you add "OXT" dictionary, but no new dictionary appears in the list, try to open "OXT" file with some archive manager and search for "hyph*.dic" file there. If it is present, extract it and add directly. If not, you're not lucky.

Settings window:



Simple part:
Install or remove dictionaries here and specify the minimum length of the word to be hyphenated.

Advanced part:

Hyphenation limits

Some of the hyphenation dictionaries contain special directions: LEFTHYPHENMIN and RIGHTHYPHENMIN. They limit syllable length on either left or right side of word. Example in the picture is 2 characters on left (overwritten with 3) and 3 characters on right for English dictionary. Some dictionaries do not contain these directions, then default limit is 2. If you don't like default or included limits, you can edit limits for each dictionary separately by marking "Override" tick mark.

Tags to ignore/parse

Some people pointed out, that there is no real (and aesthetic) need to hyphenate chapter names. Those are usually enclosed in heading tags: h1, h2 etc. I have added possibility to ignore any tags. Defaults are three headings.
You might also want to hyphenate only particular tags' content. In the example these are p and td (paragraph and table cells).
Special note. If in the "parse" tags you enter p, that means all paragraphs will be parsed and hyphenated, including their child tags, like span, em, strong etc. If you want some special tags to be ignored inside p, add them to "ignore" list. In such case you might configure some particular tags inside p to be ignored, like em, for example.

Custom column

Hyphenation status can be saved to custom column of type "Text, column shown in the tag browser". User also can define, what to write to that custom column, when hyphenation was performed and when hyphens were removed. If column name is empty, status is not written anywhere.

Next, everything is simple. Choose book with EPUB and/or AZW3 formats, click plugin's icon, choose one of the formats and click OK. Book will be hyphenated. There is also handy action to remove soft-hyphens from book via menu.


Version history:
Spoiler:

Version 0.1.3 2020-10-01
Compatibility upgrade for Python 3 and Calibre 5

NB : Versions 0.1.0 ->0.1.2 defunct betas

Version 0.0.9 2019-11-14

Version 0.0.9 2019-11-14
Fix for uppercase dictionary description
(prevented to add new Russian and Swedish dictionaries).

Version 0.0.8 2014-08-08
Get ready for Calibre 2 with Qt5!

Version 0.0.7 2013-04-22
Fix for unicode support of custom text in hyphenated custom column.

Version 0.0.6 2013-04-09
Added custom column to save hyphenation status.

Version 0.0.5 2013-03-29
Community requests and other enhancements
  • Added limits of syllable splits on left and right sides.
  • Added override of the syllable limits via settings.
  • Added "tags to ignore" and "tags to parse" lists via settings.
  • Added nice icon and generally beautified settings dialogue (gets complex).
Inspired by active interest and donations (of course).

Version 0.0.4 2013-03-26
Shortened toolbar button label as per community requests.

Version 0.0.3 2013-03-19
Fixed some issues on user feedback.
Added internal Calibre's HTML parser to avoid encoding problems.
Changed text parsing to XML parsing, much faster and efficient.

Version 0.0.2 2013-03-18
Fixes of the FAIL of first release.

Version 0.0.1 2013-03-18
The very first version of the plugin.
Soft-hyphenation of EPUB and AZW3.
Attached Files
File Type: zip HyphenateThis.zip (33.4 KB, 65250 views)

Last edited by BetterRed; 10-02-2020 at 06:21 PM. Reason: Fix Dictionaries link
SauliusP. is offline   Reply With Quote