Quote:
Originally Posted by brunello
1) I am using this method, but I wanted to automate the process, because using this system find over 500 results for book ... they are versions of texts from pdf to calibre.
However you are right, I do not lose more than 15 minutes per book.
|
yeah, but i get what you mean. it's funny because when hyphenation runs rampant the errors are usually more discernible and therefore easier to catch with regex, like
hou- se
hou-<br/>se
hou-</p> <p>se
etc...
oh well
Quote:
Originally Posted by brunello
After writing the last post, I made this regex:
Search:
(\w+\p{L}.\p{P}*\p{Pf}*[</span>]*[</i>]*)</p>\n*[ <p class="calibre1">]*
Replace:
\ 1
|
looks good. you could probably even condense it to
Code:
(?<=\s)([^\s]+)</p>\s*<p[^>]*>
if you're joining paragraphs, and then replace with
Code:
\1 <--- single space
if it captures punctuation before the closing tag it would join the paragraphs and insert a space (as a text should be) and if there was no punctuation it would separate the 2 joined words with two spaces, which wouldn't really matter in HTML unless you're using `whitespace:pre` or something.
this will also catch things like <p class='calibre'>, <p class="calibre calibre12">, etc