The longer lines are acceptable now.
I managed to remove all the inbuilt page numbers by repeating the same regex with one less \d each time I ran the replace all. The replacements are shockingly fast.
I don't really understand how to do the other cleaning up of the page I show you at comment 10 above. What type of regex will distinguish between a single word that should be the only one the line such as "Hello" and in other cases the single word should be joined to the next line?
One way would be to join all words to one continuous line until a full stop is found, but is that level of control possible?
|