Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 05-28-2014, 04:41 AM   #361
brunello
Junior Member
brunello began at the beginning.
 
Posts: 3
Karma: 10
Join Date: May 2014
Device: galaxy tab3 7''
Quote:
Originally Posted by mzmm View Post
not really a way to reasonably exclude words. if you're breaks are always between two letters you may as well just search for a hyphen using 'Replace/Find' in Sigil. it'd probably only take a few minutes.
1) I am using this method, but I wanted to automate the process, because using this system find over 500 results for book ... they are versions of texts from pdf to calibre.
However you are right, I do not lose more than 15 minutes per book.

Quote:
this should work for #2:
Code:
(?<=\s)[^\s<]+(?=</p>)
2) The second regex is perfect! much more general than that I did.
You can add to this macro to highlight also the beginning of the next paragraph, so you can replace and merge the two paragraphs?

After writing the last post, I made this regex:

Search:
(\w+\p{L}.\p{P}*\p{Pf}*[</span>]*[</i>]*)</p>\n*[ <p class="calibre1">]*

Replace:
\ 1

but your macro, being more generic, find more matches
thanks!
brunello is offline   Reply With Quote
Old 05-28-2014, 07:07 AM   #362
mzmm
Groupie
mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.
 
mzmm's Avatar
 
Posts: 171
Karma: 86271
Join Date: Feb 2012
Device: iPad, Kindle Touch, Sony PRS-T1
Quote:
Originally Posted by brunello View Post
1) I am using this method, but I wanted to automate the process, because using this system find over 500 results for book ... they are versions of texts from pdf to calibre.
However you are right, I do not lose more than 15 minutes per book.
yeah, but i get what you mean. it's funny because when hyphenation runs rampant the errors are usually more discernible and therefore easier to catch with regex, like

hou- se
hou-<br/>se
hou-</p> <p>se

etc...

oh well


Quote:
Originally Posted by brunello View Post
After writing the last post, I made this regex:

Search:
(\w+\p{L}.\p{P}*\p{Pf}*[</span>]*[</i>]*)</p>\n*[ <p class="calibre1">]*

Replace:
\ 1
looks good. you could probably even condense it to

Code:
(?<=\s)([^\s]+)</p>\s*<p[^>]*>
if you're joining paragraphs, and then replace with
Code:
\1 <--- single space
if it captures punctuation before the closing tag it would join the paragraphs and insert a space (as a text should be) and if there was no punctuation it would separate the 2 joined words with two spaces, which wouldn't really matter in HTML unless you're using `whitespace:pre` or something.

this will also catch things like <p class='calibre'>, <p class="calibre calibre12">, etc
mzmm is offline   Reply With Quote
Advert
Old 05-28-2014, 08:44 AM   #363
brunello
Junior Member
brunello began at the beginning.
 
Posts: 3
Karma: 10
Join Date: May 2014
Device: galaxy tab3 7''
Quote:
Originally Posted by mzmm View Post
looks good. you could probably even condense it to

Code:
(?<=\s)([^\s]+)</p>\s*<p[^>]*>
if you're joining paragraphs, and then replace with
Code:
\1 <--- single space
(...)
this will also catch things like <p class='calibre'>, <p class="calibre calibre12">, etc
Wow! is amazing! it works great!
And it's not as horrible as what I wrote ^_^

Can you recommend any sites that explain regular expressions? there is so much material on the internet, but I want something that fully explain regex.
Thank you again!
brunello is offline   Reply With Quote
Old 05-28-2014, 02:33 PM   #364
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Quote:
Originally Posted by brunello View Post
Wow! is amazing! it works great!
And it's not as horrible as what I wrote ^_^

Can you recommend any sites that explain regular expressions? there is so much material on the internet, but I want something that fully explain regex.
Thank you again!
http://regular-expressions.info
eschwartz is offline   Reply With Quote
Old 05-28-2014, 05:13 PM   #365
mzmm
Groupie
mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.
 
mzmm's Avatar
 
Posts: 171
Karma: 86271
Join Date: Feb 2012
Device: iPad, Kindle Touch, Sony PRS-T1
Quote:
Originally Posted by brunello View Post
Can you recommend any sites that explain regular expressions? there is so much material on the internet, but I want something that fully explain regex.
+1 for http://regular-expressions.info/
mzmm is offline   Reply With Quote
Advert
Old 06-04-2014, 04:53 AM   #366
JoHunt
I am what I am
JoHunt ought to be getting tired of karma fortunes by now.JoHunt ought to be getting tired of karma fortunes by now.JoHunt ought to be getting tired of karma fortunes by now.JoHunt ought to be getting tired of karma fortunes by now.JoHunt ought to be getting tired of karma fortunes by now.JoHunt ought to be getting tired of karma fortunes by now.JoHunt ought to be getting tired of karma fortunes by now.JoHunt ought to be getting tired of karma fortunes by now.JoHunt ought to be getting tired of karma fortunes by now.JoHunt ought to be getting tired of karma fortunes by now.JoHunt ought to be getting tired of karma fortunes by now.
 
JoHunt's Avatar
 
Posts: 6,625
Karma: 62235665
Join Date: Sep 2011
Device: iPad3, Voyage
I'm sure this has been asked many times, but my search is not giving results to my query (my search terms are probably wrong). How do I find a paragraph without a full stop such as the below example? Thanks for any help!

Code:
very fortunate, he</span></p>
JoHunt is offline   Reply With Quote
Old 06-04-2014, 05:15 AM   #367
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,582
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by joehunt View Post
How do I find a paragraph without a full stop such as the below example?
You could simply search for:

[^.]+</p>

Last edited by Doitsu; 06-04-2014 at 05:23 AM.
Doitsu is offline   Reply With Quote
Old 06-04-2014, 05:55 AM   #368
JoHunt
I am what I am
JoHunt ought to be getting tired of karma fortunes by now.JoHunt ought to be getting tired of karma fortunes by now.JoHunt ought to be getting tired of karma fortunes by now.JoHunt ought to be getting tired of karma fortunes by now.JoHunt ought to be getting tired of karma fortunes by now.JoHunt ought to be getting tired of karma fortunes by now.JoHunt ought to be getting tired of karma fortunes by now.JoHunt ought to be getting tired of karma fortunes by now.JoHunt ought to be getting tired of karma fortunes by now.JoHunt ought to be getting tired of karma fortunes by now.JoHunt ought to be getting tired of karma fortunes by now.
 
JoHunt's Avatar
 
Posts: 6,625
Karma: 62235665
Join Date: Sep 2011
Device: iPad3, Voyage
Quote:
Originally Posted by Doitsu View Post
You could simply search for:

[^.]+</p>
Thank you but it's finding all instances of </span></p> ? Is there any way of only finding those without a full stop?

Last edited by JoHunt; 06-04-2014 at 06:05 AM.
JoHunt is offline   Reply With Quote
Old 06-04-2014, 06:07 AM   #369
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,582
Karma: 22735033
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by joehunt View Post
Thank you but it looks like it's finding all instances of </span></p> ?
If most paragraphs end in </span>, you could search for:

[^.]+</span></p>

This regex should find end of paragraph tags preceded by a </span> tag and one or more characters that are not a period.
Doitsu is offline   Reply With Quote
Old 06-04-2014, 06:30 AM   #370
JoHunt
I am what I am
JoHunt ought to be getting tired of karma fortunes by now.JoHunt ought to be getting tired of karma fortunes by now.JoHunt ought to be getting tired of karma fortunes by now.JoHunt ought to be getting tired of karma fortunes by now.JoHunt ought to be getting tired of karma fortunes by now.JoHunt ought to be getting tired of karma fortunes by now.JoHunt ought to be getting tired of karma fortunes by now.JoHunt ought to be getting tired of karma fortunes by now.JoHunt ought to be getting tired of karma fortunes by now.JoHunt ought to be getting tired of karma fortunes by now.JoHunt ought to be getting tired of karma fortunes by now.
 
JoHunt's Avatar
 
Posts: 6,625
Karma: 62235665
Join Date: Sep 2011
Device: iPad3, Voyage
Quote:
Originally Posted by Doitsu View Post
If most paragraphs end in </span>, you could search for:

[^.]+</span></p>

This regex should find end of paragraph tags preceded by a </span> tag and one or more characters that are not a period.
Thanks again, but that's finding every paragraph that ends in anything but a period - such as .”</span></p> Is there no way of isolating paragraphs which end with no punctuation?

Last edited by JoHunt; 06-04-2014 at 06:37 AM.
JoHunt is offline   Reply With Quote
Old 06-04-2014, 06:52 AM   #371
mzmm
Groupie
mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.
 
mzmm's Avatar
 
Posts: 171
Karma: 86271
Join Date: Feb 2012
Device: iPad, Kindle Touch, Sony PRS-T1
what about

Code:
(?<=[a-zA-Z])(?:</[bispanemtrog]+>)*?</p>

Last edited by mzmm; 06-04-2014 at 07:00 AM.
mzmm is offline   Reply With Quote
Old 06-04-2014, 07:06 AM   #372
JoHunt
I am what I am
JoHunt ought to be getting tired of karma fortunes by now.JoHunt ought to be getting tired of karma fortunes by now.JoHunt ought to be getting tired of karma fortunes by now.JoHunt ought to be getting tired of karma fortunes by now.JoHunt ought to be getting tired of karma fortunes by now.JoHunt ought to be getting tired of karma fortunes by now.JoHunt ought to be getting tired of karma fortunes by now.JoHunt ought to be getting tired of karma fortunes by now.JoHunt ought to be getting tired of karma fortunes by now.JoHunt ought to be getting tired of karma fortunes by now.JoHunt ought to be getting tired of karma fortunes by now.
 
JoHunt's Avatar
 
Posts: 6,625
Karma: 62235665
Join Date: Sep 2011
Device: iPad3, Voyage
Quote:
Originally Posted by mzmm View Post
what about

Code:
(?<=[a-zA-Z])(?:</[bispanemtrog]+>)*?</p>
Perfect Thank you!
JoHunt is offline   Reply With Quote
Old 08-08-2014, 09:20 AM   #373
Leonatus
Wizard
Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.Leonatus ought to be getting tired of karma fortunes by now.
 
Leonatus's Avatar
 
Posts: 1,022
Karma: 10963125
Join Date: Mar 2013
Location: Guben, Brandenburg, Germany
Device: Kobo Clara 2E, Tolino Shine 3
I want to replace by lower case characters some words that have been written, following the older german grammar, beginning with upper case letters, even when they are in the middle of a sentence, such as "Du", "Dich", "Ihr", "Euch".

But put into lower case only, if they are not following period, exclamtion mark, question mark, opening quotation mark.

I have to consider that, after period, exclamation mark, question mark, those words do not immediately follow, but only after a whitespace. And after opening quotation marks they follow immediately, but should only be set into uppercase, if the quotation mark itself does not follow comma or semicolon, such as ' ... he cried, "...'.

It seems that I can only resolve those problems subsequently, but even this causes big problems searching a suiting Regex formula.

Any help is highly appreciated!
Leonatus is offline   Reply With Quote
Old 08-08-2014, 09:48 AM   #374
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,422
Karma: 85397180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Simply search for

Code:
(?<![.!?])(?<=[ ])([A-Z])(?=[a-z]+)
Code:
\L\1
The "\L" is a PCRE function saying "lowercase the next symbol in the replacement".

I am not exactly clear on how the quotations should work, so this will not handle those. Can you give me an example of two cases, one where it should be fixed, and one where it shouldn't?

Last edited by eschwartz; 08-08-2014 at 12:37 PM. Reason: fixed some stuff. ALso we want to *lowercase*
eschwartz is offline   Reply With Quote
Old 08-08-2014, 12:12 PM   #375
mzmm
Groupie
mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.
 
mzmm's Avatar
 
Posts: 171
Karma: 86271
Join Date: Feb 2012
Device: iPad, Kindle Touch, Sony PRS-T1
Quote:
Originally Posted by eschwartz View Post
...
Code:
(?![.!?])(?=[ ])([A-Z])[a-z]+
is that working for you in sigil? my PCRE editor doesn't match

i'd probably use something like

find
Code:
(?<![.!?])( [A-Z])(?=[a-z])
replace
Code:
\U\1\E
Quote:
Originally Posted by Leonatus View Post
I want to replace by lower case characters some words that have been written, following the older german grammar, beginning with upper case letters, even when they are in the middle of a sentence, such as "Du", "Dich", "Ihr", "Euch".

...
but yes, some examples of exactly want to match would help, it's a little unclear

Last edited by mzmm; 08-08-2014 at 12:18 PM.
mzmm is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Examples of Subgroups emonti8384 Lounge 32 02-26-2011 06:00 PM
Accessories Pen examples Gunnerp245 enTourage Archive 15 02-21-2011 03:23 PM
Stylesheet examples? Skitzman69 Sigil 15 09-24-2010 08:24 PM
Examples kafkaesque1978 iRiver Story 1 07-26-2010 03:49 PM
Looking for examples of typos in eBooks Tonycole General Discussions 1 05-05-2010 04:23 AM


All times are GMT -4. The time now is 11:26 AM.


MobileRead.com is a privately owned, operated and funded community.