09-09-2008, 02:49 PM | #1 |
Guru
Posts: 699
Karma: 1001556
Join Date: Jul 2008
Location: Texas
Device: Oasis 3, K4B(NT), K3/KK
|
Question about doing some searching and replacing in Word
I've been doing some editing over the weekend. Several of these are in PDF format and I am either using MobiPocket Creator to get an html file out to edit OR I am using Abbyy Transformer to get an RTF or DOC file out.
Once I have either the html or RTF/DOC file I am doing some editing in Word. In most cases I'm doing fine at removing line breaks or manual page breaks or whatever else. But in some cases, there are some paragraph breaks in the middle of sentences that I'm having a tougher time picking up without doing a grammar check and finding those paragraphs that have an incomplete sentence. In these cases, the first sentence is beginning with a lower case letter instead of being capitalized. Is there an easy way to do a search for paragraphs that start with a word with no capitalization? Am I overlooking something? |
09-09-2008, 03:58 PM | #2 |
Resident Curmudgeon
Posts: 73,975
Karma: 128903378
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
What you are experiencing is a common issue when converting PDF to some other format. The best way to make sure it's all fixed is to compare the output witht he PDF until you have gone over every paragraph.
|
Advert | |
|
09-10-2008, 08:48 AM | #3 |
Guru
Posts: 860
Karma: 4380
Join Date: Feb 2008
Location: Almada, Portugal
Device: Cybook Gen3, Sony PRS 505, Kindle DXG and Samsung Galaxy Note
|
A manual way is:
1 - do a “find and replace” as “every paragraph mark with 2 paragraph marks”; 2 - look into your text, increasing the size of the font helps showing better the oddities in the text, and it’s going to be quite easy to find the paragraph’s broken. Correct them and go on with looking into the text; 3 - in the end, do a “find and replace” to reverse the original one, “every 2 paragraph marks with just 1”. Proof reading is a long, costly and tedious process… see Project Gutenberg efford with collective proofreading! In Digitization projects is by far the most costly part of the project and one of the main reasons the PDF format “image with OCRed text under it” is so popular in these projects and also in the Enterprise world. Best regards, Last edited by DDHarriman; 09-10-2008 at 02:07 PM. |
09-10-2008, 11:53 AM | #4 | |
Wizard
Posts: 3,671
Karma: 12205348
Join Date: Mar 2008
Device: Galaxy S, Nook w/CM7
|
Quote:
1) Hit <CTRL>+H to get the find/replace 2) Depress the <MORE> button (if you only see <Less> don't do anything) 3) Check the "Use Wildcards" check box 4) In the Find text box enter w/o quotes "^13([a-z]?)" //This finds all lower case letters on a new line //**Here is the inconsistency. New paragraphs are ^p which is not the same as ^13 which is a new line. When using the "Use Wildcards" option the ^p is not supported, so there will be cases where you will not find text using the expression in line 4. 5) In the Replace enter w/o quotes " \1". // ** Note the space in before \1. This puts a space between the two words else you will have a lot of spelling errors due to word concatenation. // the \1 inserts the text found in () from step 4. If you had two ()() then you would have \1\2 =X= Last edited by =X=; 09-10-2008 at 11:55 AM. Reason: edited grammer |
|
09-10-2008, 03:05 PM | #5 | |
Guru
Posts: 699
Karma: 1001556
Join Date: Jul 2008
Location: Texas
Device: Oasis 3, K4B(NT), K3/KK
|
Quote:
|
|
Advert | |
|
09-11-2008, 12:18 AM | #6 | |
You kids get off my lawn!
Posts: 4,220
Karma: 73492664
Join Date: Aug 2007
Location: Columbus, Ohio
Device: Oasis 2 and Libra H2O and half a dozen older models I can't let go of
|
I played a little with Word macros and came up with the following to do what the other poster suggested doing in the "Find" and "Replace" boxes. I only tested this on one document, so you might want to test this on some of the documents you typically convert.
Code:
Sub ParagraphBreaksInMiddleOfSentences() Selection.WholeStory 'to delete all section breaks first (replace the ^b if you 'want to delete all section *and* page breaks or ^m if 'you want to delete only page breaks) With Selection.Find .Text = "^b" .Replacement.Text = "" .Forward = True .Wrap = wdFindContinue End With Selection.Find.Execute Replace:=wdReplaceAll 'resets FindAndReplace parameters to defaults before running next one Call ClearFindAndReplaceParameters 'to replace any combination of a paragraph return, 'then a lower case letter, with a space and the 'same lower case letter With Selection.Find .Text = "^13([a-z]?)" .Replacement.Text = " \1" .MatchWildcards = True .Forward = True .Wrap = wdFindContinue End With Selection.Find.Execute Replace:=wdReplaceAll Call ClearFindAndReplaceParameters 'to replace any combination of a line feed, then a 'lower case letter, with a space and the same lower 'case letter With Selection.Find .Text = "^l([a-z]?)" 'this is a caret (^) and a lower case L (l) .Replacement.Text = " \1" .MatchWildcards = True .Forward = True .Wrap = wdFindContinue End With Selection.Find.Execute Replace:=wdReplaceAll Call ClearFindAndReplaceParameters End Sub Sub ClearFindAndReplaceParameters() ' copied from word.mvps.org With Selection.Find .ClearFormatting .Replacement.ClearFormatting .Text = "" .Replacement.Text = "" .Forward = True .Wrap = wdFindStop .Format = False .MatchCase = False .MatchWholeWord = False .MatchWildcards = False .MatchSoundsLike = False .MatchAllWordForms = False End With End Sub Quote:
I use Windows, so it may be that this works different in other operating systems. But as I said above, I really only tested this on one "real" ebook and just one single-sentence test file...so I could be wrong. Last edited by FizzyWater; 09-11-2008 at 12:29 AM. Reason: to get code to display indents properly |
|
09-11-2008, 09:20 AM | #7 |
Guru
Posts: 699
Karma: 1001556
Join Date: Jul 2008
Location: Texas
Device: Oasis 3, K4B(NT), K3/KK
|
Thanks FizzyWater! I will try it tonight or sometime over the weekend. I have a couple of ebooks that were only in pdf format so far (which I am hating), so I will try this soon. Thanks!
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Searching books - question | jabook | PocketBook | 7 | 10-19-2010 05:37 PM |
Word question | brianh | Calibre | 4 | 08-24-2010 05:12 PM |
question about Word | joycedb | Writers' Corner | 7 | 06-23-2010 04:03 PM |
Kindle Searching Function Question | keegon | Amazon Kindle | 4 | 01-09-2010 01:19 PM |
Newbie Question on book searching | jazz_jeff | Sony Reader | 6 | 09-22-2008 06:21 PM |