MobileRead Forums

MobileRead Forums (https://www.mobileread.com/forums/index.php)
-   Sigil (https://www.mobileread.com/forums/forumdisplay.php?f=203)
-   -   Find whole words (and not only syllables) (https://www.mobileread.com/forums/showthread.php?t=335083)

Leonatus 11-24-2020 11:56 AM

Find whole words (and not only syllables)
 
Sigil performs hyphenation in the editor, that's a pretty feature.
But it seems to affect the "Find & Replace" functionality, as recently I can only look for syllables, not for whole words.
For example: If I search for "Ratte", the result is: "expression not found", but if I enter: "Rat", it will find me the syllable, which is a little inconvenient.

Is there a setting for this?

Tex2002ans 11-24-2020 12:31 PM

Quote:

Originally Posted by Leonatus (Post 4061475)
Sigil performs hyphenation in the editor, that's a pretty feature.

???

Are you sure you don't have Soft Hyphens hiding throughout your text?

Soft Hyphens are an invisible character that only turns into a hyphen when it reaches the end of a line.

A telltale sign of Soft Hyphens is when you get red squigglies on words that are spelled correctly... and/or when your search gets broken.

See some of my posts on this (explaining why Soft Hyphens are awful + problems that may occur):

Quote:

Originally Posted by Leonatus (Post 4061475)
For example: If I search for "Ratte", the result is: "expression not found", but if I enter: "Rat", it will find me the syllable, which is a little inconvenient.

Did you accidentally run Calibre's "Hyphenate This!" plugin?

What you want to do is Find/Replace for the Soft Hyphen character, and remove them all.

One easy way to do this is to go into Sigil:

1. Tools > Reports > Characters in HTML Files

If you scroll through the list, you might see:

Code:

Character: <----- (It looks like a hyphen, but it's actually an invisible character.)
Decimal: 173
Hexadecimal: AD
Entity Name: shy
Entity Description: soft hyphen

Double click on that row, and Sigil should insert a:

\ + Soft Hyphen

into the Search box.

2. Make sure the Replace: box is completely blank.

3. Change Mode: to "Regex".

4. Press Count All to see if there are any hits.

5. Press Replace All.

That should wipe all Soft Hyphens out of your book. Now you should have no problem with your normal searches.

DiapDealer 11-24-2020 12:46 PM

Soft hyphens would be my guess. Hate those things.

KevinH 11-24-2020 12:59 PM

Spellchecking should now handle soft hyphens without barfing. Search and replace will not unless you use regex to deal with them. Another way to just see the soft hyphens is to add the soft hyphen entity (named or numeric as appropriate to your epub version) to Sigil's PreserveEntities setting.

That said, I urge you to remove the soft hyphens for general work. You can add them back after the book is polished and in near final form using calibre if you really want them.

Doitsu 11-24-2020 02:12 PM

Quote:

Originally Posted by Leonatus (Post 4061475)
Sigil performs hyphenation in the editor, that's a pretty feature.

As other have already pointed out, most likely your source text contained soft hyphens that you search for with the following regular expression:

Code:

\x{00AD}
Quote:

Originally Posted by KevinH (Post 4061508)
Spellchecking should now handle soft hyphens without barfing.

Based on a quick test, Sigil spellcheck works fine with words that contain discretionary hyphens. (It works with Sigil 1.3.x and higher; it does not work with Sigil 0.9.x.)

Leonatus 11-24-2020 02:43 PM

Yes, you are all completely right! thank you for your help!
I wouldn't have thought that the epub contained soft hyphens, for I had built the epub myself, and, of course, without hyphens. But as the book lies for a considerable time in my file system, it might well be that once upon a time I had re-saved it from Calibre's file location (with soft hyphens). I might just have forgotten.
Anyway: @DiapDealer: I'm comprehensive for anglophone users to "hate these things". But the german language is different: Imagine a word like "Dampfschifffahrtsgesellschaft" - my finger nails are warping at writing this - without hyphenation on an e-bool reader! That's to ugly by far. Thus, I estimate the HypenateThis! plugin very much, as its hyphenation results for the german language are in about 85 % correct.
But your hints to detect soft hyphens in Sigil are really valuable to me in the future, as this issue occurs not so rarely.
Thank you again!

DiapDealer 11-24-2020 02:52 PM

Quote:

Originally Posted by Leonatus (Post 4061537)
Yes, you are all completely right! thank you for your help!
I wouldn't have thought that the epub contained soft hyphens, for I had built the epub myself, and, of course, without hyphens. But as the book lies for a considerable time in my file system, it might well be that once upon a time I had re-saved it from Calibre's file location (with soft hyphens). I might just have forgotten.
Anyway: @DiapDealer: I'm comprehensive for anglophone users to "hate these things". But the german language is different: Imagine a word like "Dampfschifffahrtsgesellschaft" - my finger nails are warping at writing this - without hyphenation on an e-bool reader! That's to ugly by far. Thus, I estimate the HypenateThis! plugin very much, as its hyphenation results for the german language are in about 85 % correct.
But your hints to detect soft hyphens in Sigil are really valuable to me in the future, as this issue occurs not so rarely.
Thank you again!

Oh, don't get me wrong. I know there's a valid use for them. But way too many English speaking folk choose to litter their text with them in a hackish attempt to simulate hyphenation in rendering engines that don't natively support it. THAT'S what I hate. The pollution of markup with invisible hyphens in every single word over one syllable. ;)

People should buy readers that natively support hyphenation if they read content that would suffer without it (and it matters greatly to them). Never been a big fan of content providers deciding for readers what should be important to them.

Leonatus 11-24-2020 03:41 PM

Quote:

Originally Posted by DiapDealer (Post 4061542)
People should buy readers that natively support hyphenation if they read content that would suffer without it (and it matters greatly to them). Never been a big fan of content providers deciding for readers what should be important to them.

I own a Kobo, and in fact, Kobo has a built-in hyphenation, but for reasons that I ignore, this hyphenation in the german language is rather awful, which means it isn't correct in perhaps 30 % of the examples. This is a real matter for "good" reading.

Tex2002ans 11-24-2020 04:11 PM

Quote:

Originally Posted by Leonatus (Post 4061537)
Yes, you are all completely right! thank you for your help!

:thumbsup:

Quote:

Originally Posted by Leonatus (Post 4061537)
But your hints to detect soft hyphens in Sigil are really valuable to me in the future, as this issue occurs not so rarely.

If a book ever pops up with "Find and Replace isn't working", my mind instantly jumps to soft hyphens, and that's usually the problem 100% of the time! :D

Side Note: Another potential "weird character" issue is substituting Latin characters with Cyrillic ones:

C (Latin)
С (Cyrillic letter)

It's mostly used in Phishing attacks:

https://krebsonsecurity.com/2018/03/...ual-confusion/

and unscrupulous people who try to sell you dirt cheap "writing" (on sites like Fiverr) by copying already written works and swapping characters that visually look similar... trying to get around "plagiarism checks".

Again, red squigglies, "broken search", and/or Sigil's Character Reports would give it away.

Quote:

Originally Posted by DiapDealer (Post 4061497)
Soft hyphens would be my guess. Hate those things.

Me too. Awful, awful things!

Quote:

Originally Posted by DiapDealer (Post 4061542)
But way too many English speaking folk choose to litter their text with them in a hackish attempt to simulate hyphenation in rendering engines that don't natively support it. THAT'S what I hate. The pollution of markup with invisible hyphens in every single word over one syllable. ;)

People should buy readers that natively support hyphenation if they read content that would suffer without it (and it matters greatly to them).

:thumbsup:

And with devices like Kobo, you can insert your own hyphenation dictionary if needed, and then poof, you get properly hyphenated words without all the downsides!

Quote:

Originally Posted by Leonatus (Post 4061552)
I own a Kobo, and in fact, Kobo has a built-in hyphenation, but for reasons that I ignore, this hyphenation in the german language is rather awful, which means it isn't correct in perhaps 30 % of the examples. This is a real matter for "good" reading.

You may want to check out JSWolf's "Better Hyphenation" thread.

He recently included Kobo hyphenation dictionaries for the German (DE) language.

I believe some of the default languages use extremely high left/right numbers (sometimes as high as 5), which means words might not even get hyphenated unless 10+ characters long!

Hyphenation Note: Different languages require different Left/Right minimums for proper typography (a trusted list can be found at Hyphenation.org):

2/3 (English)
2/2 (German)
2/2 (Spanish)
1/2 (Armenian)

Depending on the language, they'll use 1-3.

But 5??? Preposterous. Don't know what Kobo was thinking with those.

Quote:

Originally Posted by DiapDealer (Post 4061542)
Never been a big fan of content providers deciding for readers what should be important to them.

Exactly. Plus, as I've stated in those topics before, soft hyphens cause so much collateral damage across the board.

Breaking highlighting and dictionary support being two of the biggest that have bothered me lately:

I believe on my Kobo Forma (?), when dragging the highlight, the cursor "gets stuck" on soft hyphens, so dragging stutters in the middle of a word, not following my finger as expected.

And on many Android readers, when you highlight a soft-hyphenated word and try to dictionary lookup, it'll tell you "word is not found".

Note: I forget exact details, and I haven't experienced this in a few years... because I make sure to purge all soft hyphens from all ebooks I load up.

But the horrifying memories are still burned into my brain... :D

Leonatus 11-25-2020 12:05 PM

Quote:

Originally Posted by Tex2002ans (Post 4061568)
You may want to check out JSWolf's "Better Hyphenation" thread.

He recently included Kobo hyphenation dictionaries for the German (DE) language.

JSWolf's addition of the german hyphenation dictionary was due to my request:D.

But, besides, is it possible to edit
a) Kobo's hyphenation dictionary,
b) JSWolf's hyphenation dictionary, for example,
and how can I do it? With Notepad++?

Tex2002ans 11-25-2020 01:37 PM

Quote:

Originally Posted by Leonatus (Post 4061846)
JSWolf's addition of the german hyphenation dictionary was due to my request:D.

:thumbsup:

Quote:

Originally Posted by Leonatus (Post 4061846)
But, besides, is it possible to edit
a) Kobo's hyphenation dictionary,
b) JSWolf's hyphenation dictionary, for example,
and how can I do it? With Notepad++?

Ask JSWolf. He knows all the details (especially since he generates them!), or instructions might already be mentioned in his topic.

I believe Kobo uses a slightly different hyphenation format (OpenOffice/LibreOffice?) than normal patterns (TeX), plus you have to do some minor tweaks to get it to work on Kobo.

I don't know details though.

Leonatus 11-25-2020 02:43 PM

Ok. Thank you! I'll see.

JSWolf 11-25-2020 06:00 PM

These hyphenation dictionaries are from OpenOffice/LibreOffice. And yes they have been edited but only slightly to add in the left/right hyphenation instructions.

Leonatus 11-26-2020 07:17 AM

that was quick! Thank you!


All times are GMT -4. The time now is 10:34 PM.

Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.