The script should also identify apostrophes, in words like 'em, 'tis, and other transcriptions of spoken language (much too often one finds an opening quote in those cases instead). I also try to properly mark single quotes and apostrophes, so that I could convert single quotes into double quotes without fear of ruining apostrophes.
I do this with a mix of regexp and manual search and replace (each occurrence with the right character, which I map to hotkeys so that it's relatively easy to run along the text). This also helps locating possible missing quotes, and at the end there's always the reading phase, to confirm everything's right.
If you are going the LaTeX way, you should also check the spacing after fullstops and question/exclamation marks. By convention LaTeX put a wider space after those, which you'd have to suppress (with a \@ after the sign) in abbreviations or other not-end-of-sentence cases, and force (with a \@ before) when they come after a capital letter. I did that for my Lewis Carroll PDFs, and it's time consuming, but the result looks great (to me).
|