I've found similar problems when I do a "save as text" from Adobe Acrobat Reader, but I get page numbers and other artifacts like extra blank lines between pages and sometimes the very first letter of a chapter goes to the bottom of the page.
If it's just text, and the anomalies follow a consistent pattern, and you know a little *nix, maybe that can be done with the std text pattern tools. But, of course, that would be inaccessible if you don't know it well enough (like me).
|