Thread: Regex examples
View Single Post
Old 06-26-2012, 06:10 AM   #95
roger64
Wizard
roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.roger64 ought to be getting tired of karma fortunes by now.
 
Posts: 2,608
Karma: 3000161
Join Date: Jan 2009
Device: Kindle PW3 (wifi)
Suppressing <br /> tags only in "body text" style.

Could there be a way to destroy the soft hyphens only when they are included in a "body text" paragraph?

Rationale:

After using a new (and not perfect) OCR , I found that my recognized text was interspersed with a lot of <br /> tags (soft hyphens?). I usually insert the html file in OpenOffice and clean all formatting to begin with. Even this way, I realized that these resilient tags survived.

It is not that bad. Some poems or songs are thus nicely transcribed. On the other hand, I have to clean these tags for many standard paragraphs of text.

Sigil provides a simple way out. The user has a choice either cleaning every one of them, good and bad, or selectively and patiently suppress the useless tags...

There could a better one.

Give your songs or poems their own style, keep standard text in its "body text" class and then launch the following Regex...
roger64 is offline   Reply With Quote