Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 09-16-2020, 06:09 AM   #661
leschek
Enthusiast
leschek began at the beginning.
 
Posts: 28
Karma: 10
Join Date: Sep 2020
Device: Onyx Poke2
I hope this is correct topic to post to.

In my language we use one letter prepositions and conjunctions (a, i, o, u, k, s, v, z) which shouldn't be on the end of lines. Here is example from book I try to "epubize":
"spatřil člun a v tom člunu". (translation: "he saw a boat and in that boat")
What I want is to find letters "a" and "v" and replace them with no-break space to connect them to following word. I have this regex (I found somewhere)
Code:
\s([aiouksvz])\s
for searching, but it finds only the first letter and then skip the second one. I tried to change the searching direction in Sigil to "up", but it doesn't help. I guess there must be some problem with regex I'm using.

I also tried this example and again it finds only every second letter:

Code:
<p>some words a s i k v some words</p>
It seems there is some problem with space between letters. When I double it to:

Code:
<p>some words a  s  i  k  v  some words</p>
the searching works.
leschek is offline   Reply With Quote
Old 09-16-2020, 07:00 AM   #662
davidfor
Grand Sorcerer
davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.davidfor ought to be getting tired of karma fortunes by now.
 
Posts: 24,906
Karma: 47303748
Join Date: Jul 2011
Location: Sydney, Australia
Device: Kobo:Touch,Glo, AuraH2O, GloHD,AuraONE, ClaraHD, Libra H2O; tolinoepos
I think you want:

Code:
\b([aiouksvz])\s
That will pick a single letter followed by whitespace.
davidfor is offline   Reply With Quote
Old 09-16-2020, 07:15 AM   #663
leschek
Enthusiast
leschek began at the beginning.
 
Posts: 28
Karma: 10
Join Date: Sep 2020
Device: Onyx Poke2
Thank you, it works partialy, but it does find also parts of html code as
Code:
<a href...
and words ending with searched characters with previous character from non English alphabet as nás, při etc.
leschek is offline   Reply With Quote
Old 09-16-2020, 11:36 AM   #664
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,602
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by leschek View Post
Thank you, it works partialy, but it does find also parts of html code as
Code:
<a href...
and words ending with searched characters with previous character from non English alphabet as nás, při etc.
I'm tackling your exceptions in reverse order.

To make \b honor unicode codepoints, turn on the Unicode Character Properties flag with (*UCP)

So the above"
Code:
\b([aiouksvz])\s
becomes:
Code:
(*UCP)\b([aiouksvz])\s
This should exclude the 'i' and the 'a' characters in your 'nás' and 'při' examples

To make the expression ignore the character class matches that immediately follow an angled (x)html bracket (<) you can use a negative lookbehind. Something like:
Code:
(*UCP)(?<!\<)\b([aiouksvz])\s
should ignore the 'a' and 'i' characters used in (x)html's anchor and italic tags.

The (*UCP) flag and the (?<!\<) lookbehind are not captured groups despite the appearance. So the replacement you're looking for will still be something like:
Code:
\1&nbsp;

Last edited by DiapDealer; 09-17-2020 at 10:04 AM. Reason: Edited to correct the full expression
DiapDealer is offline   Reply With Quote
Old 09-16-2020, 04:53 PM   #665
leschek
Enthusiast
leschek began at the beginning.
 
Posts: 28
Karma: 10
Join Date: Sep 2020
Device: Onyx Poke2
Quote:
Originally Posted by DiapDealer View Post
Code:
(*UCP)(?<!\<)\b[^<]([aiouksvz])\s
So the replacement you're looking for will still be something like:
Code:
\1&nbsp;
Thank you for your time and explanation, but unfortunately it's working partially again. It ignores the html code (a href, i), which is great, but it doesn't find all letters I need to find. For example in sentence "spatřil člun a v tom člunu", it should find letters "a" and "v", but it only finds "a" and ignores "i". It also find some two-letters words as "na", "do" or in English "as" and "is".
leschek is offline   Reply With Quote
Old 09-16-2020, 07:13 PM   #666
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,602
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Apologies... I pasted the wrong full expression. It had an extraneous (and incorrect) negative character class that I was testing out.

This is the one that works for me for all of your examples so far:
Code:
(*UCP)(?<!\<)\b([aiouksvz])\s
DiapDealer is offline   Reply With Quote
Old 09-17-2020, 05:05 AM   #667
leschek
Enthusiast
leschek began at the beginning.
 
Posts: 28
Karma: 10
Join Date: Sep 2020
Device: Onyx Poke2
Quote:
Originally Posted by DiapDealer View Post
This is the one that works for me for all of your examples so far:
Code:
(*UCP)(?<!\<)\b([aiouksvz])\s
Thank you very much. I tried it on a few pages and it seems it's working as expected. Awesome.
leschek is offline   Reply With Quote
Old 09-18-2020, 08:54 AM   #668
ShdwMnrch
Junior Member
ShdwMnrch began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Sep 2020
Device: none
Hello,
I need help on regex, i have lines like these
Code:
<p>– Wahahahaha!</p>

<p>Grasha got drunk, raged and got on the table.</p>

<p>– Wahahahaha! This is a celebration party! Drink and sing guys!</p>
The dialogues are preceded with "<p>– " but I wanna wrap the dialogues with 「」these characters. I had no problem replacing the
"<p>–" to " <p> 「" but I having problems replacing "</p>" when "<p>– " is present in the beginning of the lines.

I have tried the regex search of:
Code:
(?<=<p>– .*)<\/p>
It should find only the "</p>" of the 1st and 3rd line, ignoring the 2nd line. But it seems like sigil regex doesn't support positive lookbehind and it returns nothing. Please help for any workaround. Thanks!
ShdwMnrch is offline   Reply With Quote
Old 09-18-2020, 11:09 AM   #669
BeckyEbook
Guru
BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.BeckyEbook ought to be getting tired of karma fortunes by now.
 
BeckyEbook's Avatar
 
Posts: 704
Karma: 2180740
Join Date: Jan 2017
Location: Poland
Device: Misc
Try (as long as I understand your problem correctly):
Code:
(?<=<p>– )(.+)</p>
Replace:
Code:
\1」</p>
BeckyEbook is offline   Reply With Quote
Old 09-18-2020, 11:18 AM   #670
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,602
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Sigil's PCRE regex engine certainly supports positive lookbehinds. It just doesn't support variable-length lookbehinds--positive or negative. It's a known limitation of the PCRE engine.

Use \K to simulate a variable-length lookbehind:
Code:
<p>–( .*?)\K<\/p>
Make sure "Minimal Match" is unchecked when using the above expression.

More on the use of \K here: https://www.regular-expressions.info/keep.html
DiapDealer is offline   Reply With Quote
Old 09-18-2020, 11:22 AM   #671
ShdwMnrch
Junior Member
ShdwMnrch began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Sep 2020
Device: none
Quote:
Originally Posted by BeckyEbook View Post
Try (as long as I understand your problem correctly):
Code:
(?<=<p>– )(.+)</p>
Replace:
Code:
\1」</p>
It worked, didn't thought of tinkering with the replace value. thank you!
ShdwMnrch is offline   Reply With Quote
Old 09-18-2020, 11:30 AM   #672
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,602
Karma: 193191846
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Not sure why it looks like there's an extra space in my above expression. It seems to copy and work fine, though. *shrug*
DiapDealer is offline   Reply With Quote
Old 09-18-2020, 11:31 AM   #673
ShdwMnrch
Junior Member
ShdwMnrch began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Sep 2020
Device: none
Quote:
Originally Posted by DiapDealer View Post
Sigil's PCRE regex engine certainly supports positive lookbehinds. It just doesn't support variable-length lookbehinds--positive or negative. It's a known limitation of the PCRE engine.

Use \K to simulate a variable-length lookbehind:
Code:
<p>–( .*?)\K<\/p>
Make sure "Minimal Match" is unchecked when using the above expression.

More on the use of \K here: https://www.regular-expressions.info/keep.html
I see. I just started to learn how to use regex so that helps a lot, thank you
ShdwMnrch is offline   Reply With Quote
Old 09-18-2020, 04:06 PM   #674
BillPearl
Junior Member
BillPearl ought to be getting tired of karma fortunes by now.BillPearl ought to be getting tired of karma fortunes by now.BillPearl ought to be getting tired of karma fortunes by now.BillPearl ought to be getting tired of karma fortunes by now.BillPearl ought to be getting tired of karma fortunes by now.BillPearl ought to be getting tired of karma fortunes by now.BillPearl ought to be getting tired of karma fortunes by now.BillPearl ought to be getting tired of karma fortunes by now.BillPearl ought to be getting tired of karma fortunes by now.BillPearl ought to be getting tired of karma fortunes by now.BillPearl ought to be getting tired of karma fortunes by now.
 
Posts: 7
Karma: 591908
Join Date: Jun 2011
Device: Kindle
Suggestion

\[\s][a,i,o,u,k,s,v,zç]\[\s]

will handle '<a ' case finds space before and after letter. You may want to run this with just one letter at a time using Replace All
BillPearl is offline   Reply With Quote
Old 10-12-2020, 02:34 PM   #675
hobnail
Running with scissors
hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.hobnail ought to be getting tired of karma fortunes by now.
 
Posts: 1,552
Karma: 14325282
Join Date: Nov 2019
Device: none
I don't understand why this isn't working; my search string is:

<a id="Page_([xvi]+)|([\d]+)" class="x-ebookmaker-pageno" title="\[([xvi]+)|([\d]+)\]"></a>

When the file contains

<a id="Page_iv" class="x-ebookmaker-pageno" title="[iv]"></a>

and I click on the Find button, it highlights only

<a id="Page_i

What's wrong with my regexp?
hobnail is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Examples of Subgroups emonti8384 Lounge 32 02-26-2011 06:00 PM
Accessories Pen examples Gunnerp245 enTourage Archive 15 02-21-2011 03:23 PM
Stylesheet examples? Skitzman69 Sigil 15 09-24-2010 08:24 PM
Examples kafkaesque1978 iRiver Story 1 07-26-2010 03:49 PM
Looking for examples of typos in eBooks Tonycole General Discussions 1 05-05-2010 04:23 AM


All times are GMT -4. The time now is 03:34 AM.


MobileRead.com is a privately owned, operated and funded community.