Thread: Regex examples
View Single Post
Old 05-28-2014, 07:07 AM   #362
mzmm
Groupie
mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.mzmm has not lost his or her sense of wonder.
 
mzmm's Avatar
 
Posts: 171
Karma: 86271
Join Date: Feb 2012
Device: iPad, Kindle Touch, Sony PRS-T1
Quote:
Originally Posted by brunello View Post
1) I am using this method, but I wanted to automate the process, because using this system find over 500 results for book ... they are versions of texts from pdf to calibre.
However you are right, I do not lose more than 15 minutes per book.
yeah, but i get what you mean. it's funny because when hyphenation runs rampant the errors are usually more discernible and therefore easier to catch with regex, like

hou- se
hou-<br/>se
hou-</p> <p>se

etc...

oh well


Quote:
Originally Posted by brunello View Post
After writing the last post, I made this regex:

Search:
(\w+\p{L}.\p{P}*\p{Pf}*[</span>]*[</i>]*)</p>\n*[ <p class="calibre1">]*

Replace:
\ 1
looks good. you could probably even condense it to

Code:
(?<=\s)([^\s]+)</p>\s*<p[^>]*>
if you're joining paragraphs, and then replace with
Code:
\1 <--- single space
if it captures punctuation before the closing tag it would join the paragraphs and insert a space (as a text should be) and if there was no punctuation it would separate the 2 joined words with two spaces, which wouldn't really matter in HTML unless you're using `whitespace:pre` or something.

this will also catch things like <p class='calibre'>, <p class="calibre calibre12">, etc
mzmm is offline   Reply With Quote