Now that you say that Trove was your source I think I can see how this has come about. I have used Trove in family history research for newspaper cuttings. IIRC it supplements its OCR by on-line user edits.
Perhaps if it then takes the language declared by the web browser doing the editing and wraps that text in "lang" spans to match the browser that would explain it.
Certainly in reading the epub there is no problem - it's only if you open it up with an editor (in my case to try to sort out the atrophied index) that you see the mark-up.
|