MobileRead Forums - View Single Post

theducks · 09-18-2019, 11:08 AM

That looks like it was a PDF conversion that has more than the usual from PDF problems (see sticky in the Conversion forum).
IMHO If some place sold it to you like that, return it and ask for your money back.

If you converted from a PDF you bought (the bit about reading the sticky still applies), you really will need to learn basic REGEX (<<that is a clue

to search MR for tutorials. )
From your example, this is a very dirty case. Even if it was a fairly clean case. You are looking at least a dozen passes (varied search terms) for the removal. And a bunch more to do the 'joins' after the removal.

IMHO 90% of those are Search->Eyeball the find -decide Replace Next or Search (skip the replace). Repeat for the term till the end. New term. Do it all again.
This could take less than 1hr if you are experienced with REGEX term crafting.

Then you do your 'Join' set of S&R's to clean broken paragraphs.

OTOH If this was a OCR scan PDF,

there are going to be tons of random errors that turn that hour into much longer because you need to do the find BY all the variations

09-18-2019, 11:08 AM	#2
theducks Well trained by Cats Posts: 31,281 Karma: 62000000 Join Date: Aug 2009 Location: The Central Coast of California Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A	That looks like it was a PDF conversion that has more than the usual from PDF problems (see sticky in the Conversion forum). IMHO If some place sold it to you like that, return it and ask for your money back. If you converted from a PDF you bought (the bit about reading the sticky still applies), you really will need to learn basic REGEX (<<that is a clue to search MR for tutorials. ) From your example, this is a very dirty case. Even if it was a fairly clean case. You are looking at least a dozen passes (varied search terms) for the removal. And a bunch more to do the 'joins' after the removal. IMHO 90% of those are Search->Eyeball the find -decide Replace Next or Search (skip the replace). Repeat for the term till the end. New term. Do it all again. This could take less than 1hr if you are experienced with REGEX term crafting. Then you do your 'Join' set of S&R's to clean broken paragraphs. OTOH If this was a OCR scan PDF, there are going to be tons of random errors that turn that hour into much longer because you need to do the find BY all the variations