Quote:
Originally Posted by ardeegee
My biggest completist fantasy? I would SO much love to have digital copies of the entire print runs of all pulp/"digest" type Science Fiction magazines published in the 20th century (extant examples being Asimov's and Analog) even though I would never, ever get around to reading more than a small fraction of them.
|
As it happens, I'm doing exactly that for Analog - I'm about fifteen years into it, so far. It's not a trivial task (and also, since it's all copyright, I won't be publishing it).
Quote:
Originally Posted by K-Thom
Let me clarify my "solidly proof-read". As tompe said, these novels have actually been proofread before, so this is about looking and eliminating OCR errors. After (legally) scanning about 100 novels in the past six years, I tend to know my "usual suspects" errorwise. This speeds up progress quite a lot.
|
As a follow-on activity from the above... I'm working on a Master's thesis at the moment on the subject of post-processing OCR'd texts to improve their fidelity based on local context. At the moment, I'm cataloguing errors and their corrections and noodling possibilities for correction - there are certainly patterns, but they're not common between different OCR systems and often between different scanned fonts... I can see a number of improvements over normal editing/spell correction etc which I shall attempt to implement.