View Single Post
Old 07-22-2011, 02:28 PM   #68
Faster
Connoisseur
Faster is a glorious beacon of lightFaster is a glorious beacon of lightFaster is a glorious beacon of lightFaster is a glorious beacon of lightFaster is a glorious beacon of lightFaster is a glorious beacon of lightFaster is a glorious beacon of lightFaster is a glorious beacon of lightFaster is a glorious beacon of lightFaster is a glorious beacon of lightFaster is a glorious beacon of light
 
Posts: 61
Karma: 12096
Join Date: Sep 2010
Location: Tasmania
Device: Sony PRS 650
Fixing those b..... nasty characters:

Open a blank document in Web View in MS Word.

SIGIL
Open the problem epub in Sigil.
Go into Code View.
Do a Find/Replace as follows:

(Look in: All HTML Files)
(Search Mode: Regular expression)
(Check: All)
Code:
Find:	(</body>)
Replace:	<hr class="sigilChapterBreak" />\1
The number of replacements should agree with the number of hml files in the Text folder. If not, discard and start again.

Next go to the left panel and RIGHT CLICK on the second HTML file. From the context menu that appears select 'Merge With Previous'. Repeat this until you have only one large concatenated file.
Go back to the Code View window. Select All CTRL A and Copy CTRL C.

WORD
Go to the blank MS Word document and paste CTRL V.
Select All and change the font to Arial size 14. (No effect on your epub but easier to select characters.)
Go to Tools > AutoCorrect Options > AutoFormat As You Type and ensure "Straight quote" with "smart quotes" is checked.

Now do a series of Find/Replaces CTRL H. (Click 'More' and make sure you've got Search 'All')
Find the first cluster of unwanted characters.
Put the cursor in front, then hold down <SHIFT> and use the RIGHT ARROW key to extend the selection over the cluster. Pay careful attention as to whether a space is or isn't a part of the unwanted cluster. COPY and Paste into the 'Find what' box.
In the 'Replace with' box enter the missing character. I'm afraid you'll have to resign yourself to single quotation marks rather than double otherwise you"ll be sorry!
It will take half a dozen F/Rs to correct the whole book. Replacements are typically leading and trailing quote marks, apostrophes and commas.

Note: if you generate a mistake, as I did, with spacing around commas, eg <space>comma<no space> then you can search for <space>comma and replace with comma<space>.

Finally in Word:
Select All CTRL A and Copy CTRL C.

SIGIL
Back in Code View and Select All CTRL A and Paste/Replace CTRL V with the text from Word.
Now press function key F6. This restores the text files sequence and any TOC will work as before.
Check it out in Book View and Save.
Faster is offline   Reply With Quote