Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 03-01-2014, 02:25 AM   #301
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
I am sorry but I am not sure to understand. Look at the screenshot for a big quote. What's missing?

Oh, you mean having this:
Quote:
«
at the opening of each paragraph and
Quote:
»
at the end of the last one?

If this is what you meant, problem is that the original book has none of them. Also, present or not, these quotes would not have changed a thing in the regex.
Attached Thumbnails
Click image for larger version

Name:	div2.png
Views:	236
Size:	304.7 KB
ID:	119731  

Last edited by roger64; 03-01-2014 at 04:30 AM.
roger64 is offline   Reply With Quote
Old 03-02-2014, 02:09 PM   #302
gipsy
Connoisseur
gipsy began at the beginning.
 
Posts: 81
Karma: 10
Join Date: Nov 2013
Device: Kobo Aura HD
Hi,
i use the following regex to replace some hyphenation of words.

Find: (\p{Greek})-(\p{Greek})
Replace: \1\2

There is an way to ignore some of the regex results?
for example to ignore: πλάι-πλάι, ίσα-ίσα, μισό-μισό and all the - with the same word as \1 and \2

Thanks
gipsy is offline   Reply With Quote
Advert
Old 03-02-2014, 07:36 PM   #303
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,582
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by gipsy View Post
There is an way to ignore some of the regex results
Yes, but this requires some serious "Regex Fu" (e.g. negative lookbehinds).

I'd simply search for repeated Greek words with a hyphen between them and replace the hyphens with a substitute character (@):

Find:(\p{Greek}+)-(\1)
Replace:\1@\2

Then you can use your regex and at the end you can globally replace all at signs (@) with hyphens.

@Jellby: Can you optimize this simple regex by creating a regex that will find a Greek word not followed by a hyphen and the same Greek word using backreferences and negative lookbehinds?
Doitsu is offline   Reply With Quote
Old 03-03-2014, 03:25 AM   #304
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,515
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
Quote:
Originally Posted by Doitsu View Post
@Jellby: Can you optimize this simple regex by creating a regex that will find a Greek word not followed by a hyphen and the same Greek word using backreferences and negative lookbehinds?
I could try, but I can only test it in vim, which uses a different regex dialect, so I don't think it would be very useful. Besides, I would rather do as you suggested: first replace repeated Greek words.
Jellby is online now   Reply With Quote
Old 03-04-2014, 04:14 AM   #305
mzmm
Groupie
mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.
 
mzmm's Avatar
 
Posts: 171
Karma: 86271
Join Date: Feb 2012
Device: iPad, Kindle Touch, Sony PRS-T1
you could try this:

Code:
find:
(?<=\P{Greek})(\p{Greek}+)-(?!\1)

replace:
\1
it's looking one or more greek characters in a capturing group:

(\p{Greek}+)

that are preceded by anything other than a greek character:

(?<=\P{Greek})

then a hyphen:

-

that is not followed by the group it matched previously:

(?!\1)

replacing it with \1 just removes the hyphen

** edit **

i was trying to get this to work with unicode ranges so that it could be simplified further (no need for the look-behind), but couldn't seem to get it working in sigil, or my other text editor which has PCRE, for that matter.

i was trying to match [\u0370-\u03FF] and (?-u)[\u0370-\u03FF] with no success. anyone have tips on this?

** edit 2 **

i was hoping to get rid of the look-behind by starting the expression with a word boundary, but turns out \b is only useful for ASCII characters, i.e. [a-zA-Z0-9_], so looks like the look-behind may be necessary in these cases.

here's an updated version based on Doitsu's comment below that includes Greek_Extended in the search pattern:

Code:
(?<![\x{0370}-\x{03FF}\x{1F00}-\x{1FFF}])([\x{0370}-\x{03FF}\x{1F00}-\x{1FFF}]+)-(?!\1)

Last edited by mzmm; 03-04-2014 at 07:34 AM.
mzmm is offline   Reply With Quote
Advert
Old 03-04-2014, 04:32 AM   #306
mzmm
Groupie
mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.
 
mzmm's Avatar
 
Posts: 171
Karma: 86271
Join Date: Feb 2012
Device: iPad, Kindle Touch, Sony PRS-T1
Quote:
Originally Posted by Doitsu View Post
I'd simply search for repeated Greek words with a hyphen between them and replace the hyphens with a substitute character (@):

Find:(\p{Greek}+)-(\1)
Replace:\1@\2

Then you can use your regex and at the end you can globally replace all at signs (@) with hyphens.
because of how common the @ sign has become, i'd suggest using a more obscure character like ¬ if you're going to do the search replace in steps, though.
mzmm is offline   Reply With Quote
Old 03-04-2014, 06:06 AM   #307
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,582
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by mzmm View Post
i was trying to match [\u0370-\u03FF] and (?-u)[\u0370-\u03FF] with no success. anyone have tips on this?
AFAIK, Sigil uses a PCRE engine that doesn't support \u; you'll have to use \x{xxxx} instead: [\x{0370}-\x{03FF}]

However, to be on the safe side, you may want to include the precomposed characters from the "Greek Extended" block (U+1F00 to U+1FFF):

[\x{0370}-\x{03FF}\x{1F00}-\x{1FFF}]
Doitsu is offline   Reply With Quote
Old 03-04-2014, 07:10 AM   #308
mzmm
Groupie
mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.
 
mzmm's Avatar
 
Posts: 171
Karma: 86271
Join Date: Feb 2012
Device: iPad, Kindle Touch, Sony PRS-T1
Quote:
Originally Posted by Doitsu View Post
AFAIK, Sigil uses a PCRE engine that doesn't support \u; you'll have to use \x{xxxx} instead: [\x{0370}-\x{03FF}]

However, to be on the safe side, you may want to include the precomposed characters from the "Greek Extended" block (U+1F00 to U+1FFF):

[\x{0370}-\x{03FF}\x{1F00}-\x{1FFF}]
Nice! thanks @Doitsu
mzmm is offline   Reply With Quote
Old 03-04-2014, 10:15 AM   #309
gipsy
Connoisseur
gipsy began at the beginning.
 
Posts: 81
Karma: 10
Join Date: Nov 2013
Device: Kobo Aura HD
Thanks Doitsu, thanks mzmm!
It works better now! (the edit #2)
gipsy is offline   Reply With Quote
Old 03-04-2014, 01:31 PM   #310
gipsy
Connoisseur
gipsy began at the beginning.
 
Posts: 81
Karma: 10
Join Date: Nov 2013
Device: Kobo Aura HD
I try to make a step forward. But i failed :P

to exclude some results (for example the γερο- from γερο-Κομπ) i try the

Find: (?<![\x{0370}-\x{03FF}\x{1F00}-\x{1FFF}])([^(γερο)\-][\x{0370}-\x{03FF}\x{1F00}-\x{1FFF}]+)-(?!\1)
Replace: \1

but it also exclude words like γκρεμοτσακι-ζόταν and γρα-τζουνιές

Any thoughts?

Thanks again
gipsy is offline   Reply With Quote
Old 03-04-2014, 02:48 PM   #311
mzmm
Groupie
mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.
 
mzmm's Avatar
 
Posts: 171
Karma: 86271
Join Date: Feb 2012
Device: iPad, Kindle Touch, Sony PRS-T1
mm, i think you're going to run into problems pretty quickly if you start trying to get regex to understand words (as it sounds like you've already found out). i'd really recommend against trying to write a single regex to rule them all.

that said, i'd probably come at it the other way, so you'd include Κομπ in the look-ahead, rather than γερο in the capturing group:

...<same as before>...(?!Κομπ|\1)

the pipe | separating them means 'or'.

i have no idea if this makes syntactical sense to do this in the greek language, but it matches the examples you've provided.
mzmm is offline   Reply With Quote
Old 03-08-2014, 07:07 AM   #312
John Doe
Junior Member
John Doe began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Mar 2014
Device: Android
Find and replace text but leaving some text behind

I hope this question hasn't been asked before, but here goes:

I want to make epub3 files. With notes (epub:type="noteref" and so on.)

I do know how to make the files, but it isn't automated in Sigil, sadly!

But when I make the files and create the links, Sigil makes this:

Code:
<a href="#id1">This text will have a link to a note</a>

<a id="id1">This will be the note </a>
I want to find all the
Code:
<a href="id***">
(where the id has different numbers), and replace them with
Code:
<a epub:type="noteref" href="#id***" xmlns:epub="http://www.idpf.org/2007/ops">
where I can replace the old code with new code leaving the ID numbers back.

Is that possible?
John Doe is offline   Reply With Quote
Old 03-08-2014, 08:40 AM   #313
mzmm
Groupie
mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.
 
mzmm's Avatar
 
Posts: 171
Karma: 86271
Join Date: Feb 2012
Device: iPad, Kindle Touch, Sony PRS-T1
you should be able to use

Code:
find:
<a href="#id(\d+)"

replace:
<a epub:type="noteref" href="#id\1" xmlns:epub="http://www.idpf.org/2007/ops"
mzmm is offline   Reply With Quote
Old 03-08-2014, 10:28 AM   #314
John Doe
Junior Member
John Doe began at the beginning.
 
Posts: 4
Karma: 10
Join Date: Mar 2014
Device: Android
Quote:
Originally Posted by mzmm View Post
you should be able to use

Code:
find:
<a href="#id(\d+)"

replace:
<a epub:type="noteref" href="#id\1" xmlns:epub="http://www.idpf.org/2007/ops"
Yes, that Works! Thank a lot!!

Could you please explain how it works?
John Doe is offline   Reply With Quote
Old 03-08-2014, 10:44 AM   #315
mzmm
Groupie
mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.
 
mzmm's Avatar
 
Posts: 171
Karma: 86271
Join Date: Feb 2012
Device: iPad, Kindle Touch, Sony PRS-T1
yep:

find:

<a href="#id

followed by a capturing group `()` that contains one or more integers `\d+`

(\d+)

followed by

"

replace:

<a epub:type="noteref" href="#id

followed by a back-reference to the captured group above

\1

followed by

" xmlns:epub="http://www.idpf.org/2007/ops"

this site is an invaluable reference for anything regex, basic to advanced:
http://www.regular-expressions.info
mzmm is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Examples of Subgroups emonti8384 Lounge 32 02-26-2011 06:00 PM
Accessories Pen examples Gunnerp245 enTourage Archive 15 02-21-2011 03:23 PM
Stylesheet examples? Skitzman69 Sigil 15 09-24-2010 08:24 PM
Examples kafkaesque1978 iRiver Story 1 07-26-2010 03:49 PM
Looking for examples of typos in eBooks Tonycole General Discussions 1 05-05-2010 04:23 AM


All times are GMT -4. The time now is 02:13 PM.


MobileRead.com is a privately owned, operated and funded community.