Argel,
In the vast majority of cases files of any kind will conform to their type. Text, python, HTML, C++, whatever, will use certain patterns. Once you start to understand the pattern, you can make use of it in any regex process. You know this - it is exactly what you suggested in your search & replace example.
However their will always be cases where a particular user, file or piece of software does not follow the standard pattern for some reason. In those cases you can still run regex, but on a more limited, and with greater oversight, basis. So, in my example, I would run the "humongous regex" on it first to see what happened. If the result was a really garbled mess, I would revert to the original file and apply each individual expression and look the file over.
Chances are it will become more uniform in general although several errors will creep in because of the regex. (No automated process is infallible.) Since this is something Gideon would read anyway, he can note any remaining errors as he goes. Either he can correct these one at a time or create a new regex to handle it (correcting similar errors that exist in later portions of the text).
This is what Ahi, Dale and Harry are talking about right now. The file in question sounds as though it's one that does not conform to a known and recognized pattern. So you either have to customize any regex or perform a substantial amount of the corrections manually.
Without seeing the file none of us are capable of providing assistance on this matter. Guessing only gets us so far.
|