MobileRead Forums - View Single Post

Turtle91 · 04-08-2022, 08:30 AM

Quote:

Originally Posted by DNSB

Personally, I just use a search for "([a-z]) ([a-z])" to locate them and optionally change to "\1 \2" which catches the majority of those broken paragraphs. I prefer to replace/find or find to allow me to skip those paragraphs where that structure is intentional instead of replace all. If there are too many, I will use replace all.

The blank space between the and are any spaces and CR/LF pairs picked up by copy/paste while there is a single space between \1 and \2.

Ditto.

find: ([a-z])\s*\s*<p[^>]*>\s*([a-z])
replace: \1 \2

- \s* is the code for "any amount of space, or none"
- [^>]* will catch anything inside the leading like class=, name=, id=, style= **cough cough ugh, puke...don't ever use inline styling... cough cough**
- there is a space between \1 \2

I find that if I don't keep the space between \1 \2 I create more problems than not, as Sigil's Prettify Code function will automatically condense any situations where an extra space is incorrectly added. Spell check will also catch almost all words where a single space incorrectly breaks a word.

You could certainly add symbols to the first collection group if you are finding them in your document a lot. eg. ([a-z;,-]) I would definitely step through them with a find, or replace/find, so you can check them.

FYI - the Sigil developer geniuses are actually improving the Find/Replace functionality a HUGE amount by allowing the user to create a "what if" table that shows the before/after regex that you can quickly browse before accepting the changes. That should be coming in an update in the near future. edit: Check this thread - starting at post #84