Quote:
Originally Posted by Hitch
Robert:
Doesn't that give you, as you said, a plain text file? So that all formatting is lost? For some folks, that would be a crapload more work.
Abbyy, if I am not mistaken, has a website where you can convert a single document for free. There are limits on it, and all that, but you might try searching for that.
Hitch
|
Yes, you are right about all formatting being lost. For novels, in most cases that would be headings and italics. It doesn't take much time to manually restore headings, in a plain text file (word-wrap disabled) you can usually spot them quite easily, even if they don't start with numbers or "chapter" etc. The loss of italics can be quite unfortunate, if the writer relies on them - not all of them do, though.
Unless I've missed something, even with a good OCR software, though, it's not trivial to retain italics, while getting rid of all the excess formatting that the OCR'd output file is burdened with (even Abbyy is wont to hallucinate dozens of different page and paragraph formats, font sizes, etc.). I do have a fairly recent version of Abbyy, but, for instance, when I had to convert a series of 26 PDF books for a friend (being blind, she couldn't read the PDFs), I got it done much more quickly with the pdftotext tool. (There were no italics in those PDFs, which we had
bought, but which seem to have been produced from carelessly OCR'd print editions. No idea if the paper originals had them, but I've often come across public-domain books on the Internet without italics, when the originals have them.) (Which is not meant to say it's ok to lose them.)