It should be trivial to simplify the markup with a bit of regexes, as both the paragraph separation and whitespace is present. Locate important formatting like italics, and remove everything else.
Is the "KEPITEL 6" in your example a typo? Are there any OCR errors?
|