View Single Post
Old 09-28-2008, 11:25 AM   #9
garygibsonsf
Addict
garygibsonsf ought to be getting tired of karma fortunes by now.garygibsonsf ought to be getting tired of karma fortunes by now.garygibsonsf ought to be getting tired of karma fortunes by now.garygibsonsf ought to be getting tired of karma fortunes by now.garygibsonsf ought to be getting tired of karma fortunes by now.garygibsonsf ought to be getting tired of karma fortunes by now.garygibsonsf ought to be getting tired of karma fortunes by now.garygibsonsf ought to be getting tired of karma fortunes by now.garygibsonsf ought to be getting tired of karma fortunes by now.garygibsonsf ought to be getting tired of karma fortunes by now.garygibsonsf ought to be getting tired of karma fortunes by now.
 
Posts: 321
Karma: 432192
Join Date: Dec 2007
Location: Glasgow, Scotland
Device: Amazon Kindle Paperwhite
I have a method for removing faults from pdf's converted to word files.

when it comes to getting rid of page numbers, usually the best way to fix it is this:

Find a piece of text in word that reads, say, like so -


"... and after that, the purple footed llamas

p23
never returned to Cairo."

If in Word you select and copy the section of text between 'llama' and 'never', open the find and replace box, and paste this selection into the 'find' box, Word will cleverly insert the requisite paragraph tags previously mentioned for you automatically. Then you go to the drop down box on the bottom right of the find/replace box, delete the 'p2' or 'p3' or whatever and replace it with 'p(digit)'. (Keep in mind my mac doesn't have the right key for the actual text it will insert next to the 'p' when you select 'digit' from the dropdown menu, hence I've used (digit) to represent it).

Then go to the 'replace' box, insert a single space there, search and replace, and hopefully the annoying page numbers and so forth will be gone. Keep in mind that for larger page numbers - 10 through 99, 100 through whatever, etc, will need to be replaced by 'p(digit)(digit)' and 'p(digit)(digit)(digit) respectively.

When it comes to getting rid of incorrect page returns, I use a similar method to the one above, but nails things a bit more accurately, I believe. It goes like this:

1:Search on every quote mark (") followed by a paragraph return (^p) and replace it with: + followed by a single space.
2:Then search on every full stop followed by a paragraph return and replace it with ~ followed by a single space.
3:Then search and replace every paragraph return - all of them - with a single space.
4:Then search again on every + followed by a single space, and replace them with a quote mark (") followed by a paragraph return.
5. Lastly search on every ~ followed by a single space and replace it with a full stop and paragraph return.

You can substitute other symbols similarly unlikely to turn up in your document than ~ or +. A fair bit of the time, this method works for me. Not all the time, but enough times to make this the first means I turn to. Keep in mind the author of a particular document might use a single rather than double quote when it comes to sentences.

Edit - I missed the method someone posted up for getting rid of page numbers. I see now it's pretty much the same as I suggested.
garygibsonsf is offline   Reply With Quote