MobileRead Forums - View Single Post

roger64 · 06-26-2012, 06:10 AM

Suppressing <br /> tags only in "body text" style.

Could there be a way to destroy the soft hyphens only when they are included in a "body text" paragraph?

Rationale:

After using a new (and not perfect) OCR , I found that my recognized text was interspersed with a lot of <br /> tags (soft hyphens?). I usually insert the html file in OpenOffice and clean all formatting to begin with. Even this way, I realized that these resilient tags survived.

It is not that bad. Some poems or songs are thus nicely transcribed. On the other hand, I have to clean these tags for many standard paragraphs of text.

Sigil provides a simple way out. The user has a choice either cleaning every one of them, good and bad, or selectively and patiently suppress the useless tags...

There could a better one.

Give your songs or poems their own style, keep standard text in its "body text" class and then launch the following Regex...

06-26-2012, 06:10 AM	#95
roger64 Wizard Posts: 2,608 Karma: 3000161 Join Date: Jan 2009 Device: Kindle PW3 (wifi)	Suppressing <br /> tags only in "body text" style. Could there be a way to destroy the soft hyphens only when they are included in a "body text" paragraph? Rationale: After using a new (and not perfect) OCR , I found that my recognized text was interspersed with a lot of <br /> tags (soft hyphens?). I usually insert the html file in OpenOffice and clean all formatting to begin with. Even this way, I realized that these resilient tags survived. It is not that bad. Some poems or songs are thus nicely transcribed. On the other hand, I have to clean these tags for many standard paragraphs of text. Sigil provides a simple way out. The user has a choice either cleaning every one of them, good and bad, or selectively and patiently suppress the useless tags... There could a better one. Give your songs or poems their own style, keep standard text in its "body text" class and then launch the following Regex...