MobileRead Forums - View Single Post

CalibUser · 11-01-2015, 03:34 PM

Update for the ePub Tidy Tool

A new version, v0.1.1.6, has been attached to the first article in this thread and the manual has been updated.
This plugin has been tested on Windows 7 and requires that Python 3 is installed on your computer.

The following features have been added:

Has a new customised word list. Some words may be accepted by the spell checker because they are spelt correctly but an incorrect word is used. For example, sometimes the word "modern" is read by an OCR package as "modem". In this case the words needs to be checked manually. You can provide a list of these words for the plugin to process. It will find each of these words and present the paragraph that contains it, together with an alternative word. You can then select the alternative word or retain the original word.
Has an option tick box for processing Greek text
Fixes incorrect Greek words that have Π, ώ, ω and έ missing (fix provided by gipsy)
Has an extra option for processing span tags: Change to small uppercase italics. This can change the text that has the style "font-variant:small-caps" to upper case italics in a smaller font than normal and puts the span tag <uCaseSmallItalics> around the capitalised text. This allows you to define a class uCaseSmallItalics in a css file to allow a smaller size font to be applied to the italicised capitalised text
Has an option for importing a CSS file so that you can use your preferred format for text.

To use the customised word list you need to install Beautiful Soup. Instructions for this are given in the manual for Windows 7; for other systems (Mac, Linux)please search the web.

Important: Beautiful Soup will change all html mark-ups (eg &lsquo

to a single character (in this case, a left single quote mark) when it processes text. To ensure that the text processed by Beautiful Soup matches the html file exactly, it is necessary to tick the box Replace HTML code eg &msdash; to find all suspect words. This will change html characters in the ePub to single characters that are used in the search.

The code that implements the manual word check is slow compared to the automatic word search. When you press a button to accept/reject changing a word, there may a brief pause while the plugin finds the next paragraph that contains a suspect word. Despite this, it is faster to use the plugin than to use the normal Find/Search facility that is built into Sigil where you would need to manually enter each word that could be suspect and also risk leaving some out!

11-01-2015, 03:34 PM	#103
CalibUser Addict Posts: 203 Karma: 62362 Join Date: Jul 2015 Device: Sony	Update for the ePub Tidy Tool - version, v0.1.1.6 available Update for the ePub Tidy Tool A new version, v0.1.1.6, has been attached to the first article in this thread and the manual has been updated. This plugin has been tested on Windows 7 and requires that Python 3 is installed on your computer. The following features have been added: Has a new customised word list. Some words may be accepted by the spell checker because they are spelt correctly but an incorrect word is used. For example, sometimes the word "modern" is read by an OCR package as "modem". In this case the words needs to be checked manually. You can provide a list of these words for the plugin to process. It will find each of these words and present the paragraph that contains it, together with an alternative word. You can then select the alternative word or retain the original word. Has an option tick box for processing Greek text Fixes incorrect Greek words that have Π, ώ, ω and έ missing (fix provided by gipsy) Has an extra option for processing span tags: Change to small uppercase italics. This can change the text that has the style "font-variant:small-caps" to upper case italics in a smaller font than normal and puts the span tag <uCaseSmallItalics> around the capitalised text. This allows you to define a class uCaseSmallItalics in a css file to allow a smaller size font to be applied to the italicised capitalised text Has an option for importing a CSS file so that you can use your preferred format for text. To use the customised word list you need to install Beautiful Soup. Instructions for this are given in the manual for Windows 7; for other systems (Mac, Linux)please search the web. Important: Beautiful Soup will change all html mark-ups (eg &lsquo to a single character (in this case, a left single quote mark) when it processes text. To ensure that the text processed by Beautiful Soup matches the html file exactly, it is necessary to tick the box Replace HTML code eg &msdash; to find all suspect words. This will change html characters in the ePub to single characters that are used in the search. The code that implements the manual word check is slow compared to the automatic word search. When you press a button to accept/reject changing a word, there may a brief pause while the plugin finds the next paragraph that contains a suspect word. Despite this, it is faster to use the plugin than to use the normal Find/Search facility that is built into Sigil where you would need to manually enter each word that could be suspect and also risk leaving some out!