View Single Post
Old 05-20-2014, 08:23 AM   #6
Toxaris
Wizard
Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.Toxaris ought to be getting tired of karma fortunes by now.
 
Toxaris's Avatar
 
Posts: 4,520
Karma: 121692313
Join Date: Oct 2009
Location: Heemskerk, NL
Device: PRS-T1, Kobo Touch, Kobo Aura
Quote:
Originally Posted by kacir View Post
You are not looking for MSWord, you are looking for "Regular Expressions". Word has only limited abilities comparing to other tools, like Calibre. With the

...<snip>...

You see, most of the linebreaks that do not follow: [.?!"] are not at the end of paragraph.
This is very quick and dirty, but can clean an OCRed book from unwanted line breaks with 99% accuracy

<snip>...
You are apparently not aware of the 'use wildcard' option within Word S&R. That enables RegEx types of search. The example you give is possible within Word without a problem. Heck, even the syntax is almost identical. If you know RegEx, you can work with Wildcard search in Word...

@momtodogs: If you want to clean Word documents from OCR mess, I can advise you to look at my addin (see signature). It will catch a lot of OCR mistakes and either repairs them automatically or manually. It has various steps, one of them is a large list of S&R requests (an example list is available). It would also catch things like smarten punctuation and missing dialogue marks (and many other things). As a bonus, you can even export it to ePUB.
Toxaris is offline   Reply With Quote