MobileRead Forums

MobileRead Forums (https://www.mobileread.com/forums/index.php)
-   Plugins (https://www.mobileread.com/forums/forumdisplay.php?f=268)
-   -   Shady Characters plugin (https://www.mobileread.com/forums/showthread.php?t=260408)

DiapDealer 05-11-2015 09:19 AM

Shady Characters plugin
 
1 Attachment(s)
*Note: Requires Sigil v0.8.3 and later *
*Note: Do not rename any Sigil plugin zip files before installing *

This plugin will work with either Python 3.4+ or Python 2.7+ (defaults to 3.x if both are present).

The Shady Characters plugin will expose many invisible unicode characters that can elude the eye and make editing searching (x)html markup confusing.

It converts those "invisible" characters to their numeric entity equivalents (I chose numeric because it's less likely to cause errors in conjunction with DTDs--or lack thereof). You can then do to those entities what you want.

Currently it looks for:

thin space
soft hyphen
zero-width joiner
word-joiner
zero-width non-breaking space
narrow non-breaking space
non-breaking space (only for completeness ... the Sigil environment is pretty toxic to the non-breaking space character anyway)

If you wish to customize the characters searched for, and/or what they get replaced with, edit the ShadyCharacters.json file (<Sigil preferences directory>/plugins_prefs/ShadyCharacters/ShadyCharacters.json). You will need to run the plugin at least once before the preference file exists. A single entry will look like:

Code:

"zwj": [
      "0x200D",
      "&#8205;"
    ],

The first field ("zwj") is fairly meaningless... it's a label so you know what the character is (zero-width joiner in this case).

The second field ("0x200D") is a string representation of the hex value of the character being sought (please include the quotes and the "0x" prefix)

The third field ("&#8205;") is the numeric entity that you wish to replace the character with when it's found (please note that the character will be replaced with whatever you choose to put here.)

In short:
Code:

"zwj": ["0x200D","&#8205;"],
is one entry. The trailing comma is only necessary if there's another entry to follow, but it hurts nothing if it's there on the last entry.

*Please note that if you edit the document in Book View after you convert the characters to entities with this plugin, they will be converted back to invisible characters (unless you have the numeric entities listed in Sigil's Preserve Entities preferences). Simply run the plugin again to turn them back to entities.

KevinH 05-11-2015 02:27 PM

Added this to the official plugin threads list.

KevinH

DiapDealer 05-11-2015 03:22 PM

Quote:

Originally Posted by KevinH (Post 3100144)
Added this to the official plugin threads list.

KevinH

Thanks! I forgot to do that this morning and was away from my PC 'til this afternoon. :)

najgori 05-12-2015 02:54 PM

this plugin is great framework for transliteration (well, serbian anyway). thanx a lot.

DiapDealer 05-12-2015 02:57 PM

Quote:

Originally Posted by najgori (Post 3100724)
this plugin is great framework for transliteration (well, serbian anyway). thanx a lot.

Glad you find it useful! Are there other unicode characters you think I should include by default (trying to stick with the invisible and/or special joiner/space theme)?

najgori 05-14-2015 09:45 AM

sure. if i remember anything. :)

AlanHK 02-26-2019 07:10 AM

Quote:

Originally Posted by DiapDealer (Post 3100726)
Are there other unicode characters you think I should include by default

I had a book that used the "hair space" character:
https://www.fileformat.info/info/uni...200a/index.htm

So I added this to the json:
Code:


 "hairsp": [
      "0x200A",
      "&#8202;"
    ],

I used a file with :
Code:

<h1>hair 8202 =&#8202;=</h1>
The space shows in preview. Saved the file and the code is converted to a character.
But the plugin doesn't find them.

Quote:

Status: success

No shady characters found in Text/Section0001.xhtml.
No shady characters found at all - no files altered.
-- PS. forum converted all the entities. Why the hell does "code" get converted....
Looked at your posts and see I need to use NOPARSE.

DiapDealer 02-26-2019 07:48 AM

This plugin was created before the Preserve Entities preference setting was added to Sigil. You should be able to add the hairsp entity (or any other invisible character entity) to your preserve entities list and Sigil will convert any characters found when saving or manually Mending/Prettifying. If not, then the hairspace entity may be being converted into some other character (for unknown reasons.

I'll take a look at why the plugin may be acting up on the hairspace char, though. Have you verified the plugin still even works with any of the invisible characters? Sigil's gone through many changes since its time.

AlanHK 02-26-2019 09:20 AM

Quote:

Originally Posted by DiapDealer (Post 3814889)
I'll take a look at why the plugin may be acting up on the hairspace char, though. Have you verified the plugin still even works with any of the invisible characters?

It does work with the default list.

But any changes I make to ShadyCharacters.json are reverted when I run the plugin.
Even if I delete everything, when I run the plugin it just restores the default.

Meanwhile; the preserve entities preference does work.

DiapDealer 02-26-2019 11:06 AM

I uploaded a fix for the plugin so that it actually saves and uses any user-added characters to the json prefs file. Mainly for posterity.

AlanHK 02-26-2019 11:10 AM

Thanks.
Quote:

Originally Posted by DiapDealer (Post 3814964)
I uploaded a fix for the plugin so that it actually saves and uses any user-added characters to the json prefs file.

So, for four years, that didn't work and no one noticed?

At least it wasn't me screwing up.
I kept thinking I had edited the wrong file or messed it up somehow.

Quote:

Originally Posted by DiapDealer (Post 3814964)
Mainly for posterity.

There is a use for it. If you want to see what characters are there and maybe then some S&R. But when done, you can let them go back to Unicode. If you use "preserve entities" you always see the codes.

I found this useful: https://www.w3schools.com/charsets/r...unctuation.asp
Lists "punctuation" including a lot of esoteric spaces.

DiapDealer 02-26-2019 11:33 AM

Quote:

Originally Posted by AlanHK (Post 3814967)
So, for four years, that didn't work and no one noticed?

Like I mentioned: it was just a stop-gap solution until the Preserve Entities feature was implemented. Not that many people used it--let alone customized it. ;)

Quote:

Originally Posted by AlanHK (Post 3814967)
At least it wasn't me screwing up.
I kept thinking I had edited the wrong file or messed it up somehow.

Sorry about that. Customizations could always be added directly to the matrix at the top of the plugin.py file. Perhaps that's what the few others who made use of the plugin did to customize it. *shrug*


All times are GMT -4. The time now is 08:51 PM.

Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.