MobileRead Forums - View Single Post

mzmm · 05-28-2014, 07:07 AM

Quote:

Originally Posted by brunello

1) I am using this method, but I wanted to automate the process, because using this system find over 500 results for book ... they are versions of texts from pdf to calibre.
However you are right, I do not lose more than 15 minutes per book.

yeah, but i get what you mean. it's funny because when hyphenation runs rampant the errors are usually more discernible and therefore easier to catch with regex, like

hou- se
hou- se
hou- se

etc...

oh well

Quote:

Originally Posted by brunello

After writing the last post, I made this regex:

Search:
(\w+\p{L}.\p{P}*\p{Pf}*[]*[]*)\n*[ ]*

Replace:
\ 1

looks good. you could probably even condense it to

Code:

(?<=\s)([^\s]+)</p>\s*<p[^>]*>

if you're joining paragraphs, and then replace with

Code:

\1 <--- single space

if it captures punctuation before the closing tag it would join the paragraphs and insert a space (as a text should be) and if there was no punctuation it would separate the 2 joined words with two spaces, which wouldn't really matter in HTML unless you're using `whitespace:pre` or something.

this will also catch things like , , etc