Proof reading help please - General, How to - Page 2

BetterRed · 07-08-2016, 07:13 PM

@Doitsu - thanks for the tips re use of Grammar Checker, I was planning on trying it over weekend.

Having cut my teeth on Algol and various assemblers, I find it 'interesting' that your sample combines the ultra-verbosity of XML with maxi-terseness of regex. Not your fault of course, its the way of the world as it is - more given to extremes.

FWIW - I use the calibre editor Reports to: a) eyeball the Words list filtered on '-' (this morning I found 'them-selves' and 'Wag-nails'), b) scan the Character list for 'odd-ball' characters. The ability to sort the various lists on frequency is helpful, as is the facility to save a list to a csv.

BR

AlexBell · 07-08-2016, 11:35 PM

Thanks for all your responses - I've been so busy proofreading that I haven't been back on the forum. I obviously have a lot to read through and digest.

I may need to consider my method of doing ebooks. I don't use Word; I use LibreOffice. I don't use Sigil; I use the CoffeeCup HTML editor I used when I dabbled in website design years ago - though I often use Notepad++ first. CoffeeCup does have a spell checker

My usual practice is to find I book I want to do on the Internet Archive or elsewhere, and download the pdf and ePub 'ebook' files. Then I open up the ePub ebook file and edit the HTML files within.

Another thing I've learned recently is to take more trouble to find the best possible original file. Some of the pdfs on the Internet Archive are just awful, making it much more likely that I'll read a comma as a period, and vice versa and so on. So where I can I use the HathiTrust version to check against - their version is usually excellent, and they seem to have nearly everything I want. But of course one cannot download the files.

Thanks for the suggestions about fonts; I normally use Amasis when proofreading, but will experiment with the other font families and see if that helps.

Tex2002ans · 07-09-2016, 12:16 AM

Quote:

Originally Posted by Doitsu

Unfortunately, neither the default LibreOffice grammar checker nor LanguageTool caught the error in the paragraph that you posted. Do you use custom settings or did you get the warning from the MS Word grammar checker?

That typo was caught by Microsoft Word's grammar check.

I must admit, I haven't touched LibreOffice in a while (I just use Notepad++ for all my writing). But the more types of tools you can throw at it, the better (certain tools might catch errors that others might miss).

Quote:

Originally Posted by Doitsu

Shameless plug: In my quest to corner the Sigil wrapper plugin market, I released a LanguageTool (grammar check) validation plugin. (Validation plugin means that it'll display warnings like FlightCrew and not like LibreOffice or the standalone LanguageTool version.)

Hmmmm, very interesting. Well you convinced me to download the latest version of Sigil and test it out (I was holding out on 0.8.6 for a while while the dust settled).

The plugin felt quite rough:

If you doubleclick on the error, it jumps you to the paragraph (not necessarily the EXACT position the error is located)
- Long paragraphs make it very hard to spot where the error actually occurred.
- Is there any possible way for it to highlight the exact position in the text?
Is there a way to split the messages into more columns? It is quite hard to read these errors and figure out WHAT exactly it is complaining about.
- Currently: File + Line + MessageJammedIntoOneGiantLine
- Potential: File + Line + Reason + Sentence + Suggestion
Is there any possible way for it to run on the entire EPUB at once? Or am I just crazy? (Or didn't read your instructions properly). Currently I am just running it a chapter at a time.
Any stats/thoughts on adding the n-gram data?
- Is there anywhere I could test this beforehand before trying to download the 8 GB beast? :P
- About how many more errors will this point out? How many more false positives might I have to sift through, or does it do a pretty good job?

Quote:

Originally Posted by GrannyGrump

@Doitsu --- I am looking forward to trying the plugin this weekend. Glad I will no longer have to schlepp my files to work and paste into MS Word for their grammar check.

PASTE into MS Word? You do know Toxaris's Tools now have "Import EPUB"? Or you could do what I did before Toxaris introduced that... Calibre convert EPUB -> RTF/HTMLZ/DOCX + Open it up in your Word Processor of choice.

Quote:

Originally Posted by BetterRed

a) eyeball the Words list filtered on '-' (this morning I found 'them-selves' and 'Wag-nails'),

You are welcome.

Quote:

Originally Posted by BetterRed

b) scan the Character list for 'odd-ball' characters.

Yes, this is one of the first steps I do after I OCR the book. Who knows what crazy characters might have snuck in (or accents on characters). I then go through the book and check every odd/accented character to doublecheck they are correct. Doing this pass also helps you potentially catch inconsistencies like "vis-à-vis" + "vis-a-vis" existing in the same book.

Side Note: Before Toxaris comes swooping in here, yes, his EPUB Tools also has "Check Accents".

Quote:

Originally Posted by BetterRed

The ability to sort the various lists on frequency is helpful, as is the facility to save a list to a csv.

Exporting to CSV has also been recently added to my repertoire (within the last few months). If anything of substance comes out of that research, I will also post that info on MobileRead. (Already caught a few typos that slipped by in my previous passes). :P

Again, just a different way to visualize the data might make discrepancies stand out like a sore thumb.

Jellby · 07-09-2016, 06:51 AM

Quote:

Originally Posted by GrannyGrump

@Jellby --- I've taken a quick test drive with DP Custom Mono 2.
Is it supposed to look so rough?

Is it intended to look rough? I don't think so, but it could be, maybe the reasoning is that an ugly font will let you focus on the letters more and not on the meaning.

Is there something "wrong" in your system? I don't think so either. I haven't actually tried the font (maybe once long ago), but it takes a fair amount of work and knowledge to make a nice and smooth font, and/or a sophisticate software. I guess the creators of the font didn't have either of them.

Doitsu · 07-09-2016, 07:27 AM

Quote:

Originally Posted by Tex2002ans

It felt quite rough:

It is indeed a bit rough, but it was the best I could do with my very limited Python skills.
BTW, I found a Windows bug related to the ngram spellcheck feature that required a minor update. If you want to experiment with ngrams, you'll need to install the latest version.

As for your questions:

Quote:

Originally Posted by Tex2002ans

Is there any possible way for it to highlight the exact position in the text?

Only if I hard-coded some kind of highlight style, that you'd have to remove from the many false positives.

This feature might be easier to implement in Calibre, because it's based on Python.
Maybe Kovid Goyal will implement it, if you ask him nicely.

I'll also ask KevinH, whether he could add some kind of Python-accessible highlight function, but since that would probably require a lot of work and not that many people are interested in this plugin, it's not very likely to happen.

Quote:

Originally Posted by Tex2002ans

Is there a way to split the messages into more columns?

Unfortunately, the software module used for validation messages doesn't support multi-line text.

Quote:

Originally Posted by Tex2002ans

Is there any possible way for it to run on the entire EPUB at once? Or am I just crazy? (Or didn't read your instructions properly).

Actually, my instructions were a bit unclear on that. By default the plugin will only check the currently selected file. If you want to check all files, either select all files or none (e.g., select the Text folder). You can also force the plugin to always check all files by changing the following value in LanguageTool.json.

Code:

"allFiles": true

(If it's not the last entry, you'll also need to add a comma at the end.)

Quote:

Originally Posted by Tex2002ans

Any stats/thoughts on adding the n-gram data?

It really slows LanguageTool down, but it did find some problems. It all depends on the texts that you want to check.

Quote:

Originally Posted by Tex2002ans

How many more false positives might I have to sift through, or does it do a pretty good job?

It reports fewer false positives than the regular grammar check. I usually use it after the regular grammar check with a special LanguageTool.json file:

Code:

{
  "enabledOnly": true,
  "enabledRules": "CONFUSION_RULE",  
  "ngramIndexDir": "C:/ngrams",
  "ltPath": "C:/Program Files/LanguageTool-3.3/languagetool-commandline.jar", 
  "allFiles": true
}

With these settings LanguageTool will only run the ngram spellcheck. It's still rather slow.

If you want to experiment with the ngram spellcheak feature, you'll need to create a folder with an en subfolder in it and extract the ngram data files to that en folder. For example, on my machine the ngram files are in C:\ngrams\en (e.g. C:\ngrams\en\1grams).
As far as LanguageTool is concerned, ngrams is the ngram folder that you'll need to specify via ngramIndexDir.
Note also that you'll need to replace backslashes in folder names with slashes or write the backslash twice.
For example:

Code:

  "ngramIndexDir": "C:/ngrams",

or

Code:

    "ngramIndexDir": "C:\\ngrams",

BTW, the ngram spellcheck didn't flag "it original usefulness", but this could be easily added as a custom rule.

GrannyGrump · 07-10-2016, 11:27 PM

Well, after all the advanced technical discussions, this post is a bit like a mouse screaming at a lion, but here is a short list of frequent OCR errors I have come across. There are many more I have never noted down, but just fixed on the fly.

Maybe more folks can share their "little lists" for the edification of us all.

Some of these will be caught with spell-check,
but not all, by any means ...

OCR VILLAINS:

Spoiler:

Tex2002ans · 07-11-2016, 06:42 AM

Quote:

Originally Posted by AlexBell

Thanks for all your responses - I've been so busy proofreading that I haven't been back on the forum. I obviously have a lot to read through and digest.

You still have to tell us all those errors in your books!

Quote:

Originally Posted by Doitsu

If you want to experiment with the ngram spellcheak feature, you'll need to create a folder with an en subfolder in it and extract the ngram data files to that en folder.

I'll have to do that some time in the future. Will definitely keep your plugin on my radar and run it on old books + see if I can point out any errors that it misses.

Quote:

Maybe more folks can share their "little lists" for the edification of us all.

I have been meaning to put together one of my "lists" for so long. Maybe in the coming weeks I will have to gather the info and actually do something about it this time.

Most of the information I have directly on hand is all of the actual book typos I have come across over the years.

I stopped writing down OCR errors so many years ago, and now could probably only gather them with code comparisons between EPUB versions as I worked on them.

Quote:

1 l I i ! <--> each other
{digit One, lowercase L, uppercase i, lowercase i, exclamation mark}

Speaking of my "I963" -> "1963" example, yesterday I caught "J969". There was a speck of dust in the PDF scan at the bottom left of the "1", which caused it to OCR as "J". It reminded me that I have seen this just due to normal OCR, although it is quite rare.

Quote:

U = double ell, li, il
WeU = Well
Ufe = life
untU = until

Typically when you OCR a book this entire "class" pops up, so you can easily spot it. If this occurs, I typically just put in a capital "U" into Sigil/Calibre Spellcheck.

There probably aren't many actual words in the book with a capital U in them, so they stick out like a sore thumb... especially if you sort the Spellcheck List by Case Sensitive Sort. Anything that starts with a lowercase letter and has an uppercase "U" in it is a mistake 99% of the time.

Side Note: That type of search is better in Calibre's Spellcheck because you can do a Case Sensitive Search.

Quote:

Space following opening quote mark
Space preceding closing quote or punctuation mark.
He did this ; then he did that ; then he said : “ You aren’t ready ! ”

Also want to pay attention to spaces before/after slashes. Quite often an error might creep in such as "and /or" + "and/ or".

Side Note: I even caught this in quite a few InDesign files as well. This is an easy error to slip by even in purely digital files.

Quote:

Apostrophe goes missing, stranding the last letter
I m = I’m, don t = don’t, Bob s = Bob’s

I typically run this Regex to catch all lowercase letters that are by themselves that are not "a":

Search: \s[b-z]\s

Similarly, I run this one too to catch all capital letters that are by themselves that are not "A" or "I":

Search: \s[B-HJ-Z]\s

Those basic Regexes do miss the odd case of that occurring anywhere near an HTML tag though. So it would miss:

B ob said to go outside!

or:

Then S uzy told Bob to jump over the fence.

But if the book is riddled with them, then I make sure to look much more closely (and those typically get caught at other passes, or just write up a custom Regex to catch that error).

Side Note: I don't use the capitals one too often because many of the books I work on have text along these lines: "Product C and Product D" + "Person X and Y".

Quote:

Reversed single and double quotes in nested quotations:
“And I said to him, ‘Quit that!”’
‘“O what a tangled web we weave,’” she said.

This is also a Search/Replace that I use:

Search: ‘“
Replace: “‘

Search: ”’
Replace: ’”

Although use those on a case-by-case basis (don't just do a huge Replace All).

Side Note: Quotation marks typically require some scrutiny, because there are a huge amount of actual book typos that have creeped in due to wrong nesting. As a related note, I found that parenthesis + brackets follow the same rules, and also have a relatively large amount of nesting errors. This was an entire class of errors that I missed until I used Toxaris's "Dialogue Check" (Pure Regex is not as good).

Quote:

’ Right single quote should replace "straight" apostrophe, not ‘ Left single quote. Happens often at start of a word:
‘em should be ’em, ‘tis should be ’tis

This is the Regex I use:

Search: ‘(Em|em|Til|til|Tis|tis|Twas|twas)
Replace: ’\1

Related is the RIGHT single quote before shortened years:

Search: ‘([0-9])
Replace: ’\1

or the RIGHT single quote before + after the "n":

Rock ’n’ Roll

Doitsu · 07-11-2016, 07:43 AM

Quote:

Originally Posted by GrannyGrump

OCR VILLAINS:

I had a look at the documentation for the Hunspell library, which appears to have been written by a programmer who does his taxes in binary, and found out that it's possible to add custom letter replacements to get betters spelling suggestions.

Replacements need to be defined in the affix file (e.g. en_US.aff for US English), which is a plain text file that can be edited with a programmer's editor, e.g. Notepad ++.

The format is as follows

Code:

REP {number of following entries}
REP {OLD} {NEW}

For example the original replacement section in en_US.aff looks like this:

Code:

REP 94
REP nt n't
...
...
REP shun tion
REP shun sion
REP shun cion

Based on your OCR villains list, I've created a custom list, added it after the last entry and updated the replacement count to REP 127 (94 existing entries + 33 new ones):

Spoiler:

With this change in place, the first suggestion for "ahnost" is no longer stenost, but almost and the suggestion for "hke" is like instead of hike.

If you want to test my modified file:

1. Go to C:\Program Files\Sigil\hunspell_dictionaries
2. Create a backup copy of en_US.aff.
3. Overwrite en_US.aff with the attached version. (You'll need to confirm a system warning.)

Jellby · 07-12-2016, 06:26 AM

DP has a list of some words that will not be detected by a spell checker, but are most probably OCR errors (scannos), among them the infamous "arid" (for and) and "modem" (for modern):

http://www.pgdp.net/c/faq/wordcheck-...ite_word_lists

AlexBell · 07-15-2016, 12:58 AM

Thanks, Tex2002an, #22. I'm afraid I haven't kept a record. As I remember many of them were , instead of . and vice versa, and I instead of ! and vice versa. But many of them just shouldn't have been there at all.

The pdf originals from which the ePub files I used were made were of quite poor quality - though that's no excuse.

Tex2002ans · 07-15-2016, 05:36 AM

Quote:

Originally Posted by Jellby

DP has a list of some words that will not be detected by a spell checker, but are most probably OCR errors (scannos), among them the infamous "arid" (for and) and "modem" (for modern):

http://www.pgdp.net/c/faq/wordcheck-...ite_word_lists

Thanks for this link, you always seem to post it, and I always seem to forget about it. I should try to embed this into my brain.

Quote:

Originally Posted by AlexBell

Thanks, Tex2002an, #22. I'm afraid I haven't kept a record. As I remember many of them were , instead of . and vice versa, and I instead of ! and vice versa. But many of them just shouldn't have been there at all.

Ahh, that is too bad. Does nobody else save all the versions of the file as they work on them?

I tend to mark all of my files with [YYYY.MM.DD] and just save them as I go along. Therefore in the future, I could easily use code comparison tools on the EPUBs to see exactly what has changed between versions.

Quote:

Originally Posted by AlexBell

The pdf originals from which the ePub files I used were made were of quite poor quality - though that's no excuse.

Can you link to the Archive.org versions you used + your completed EPUB?

Side Note: Here are a few common OCR errors I ran into tonight:

o£ -> of
tbe -> the
lias -> has

Roman Numeral Problems with the "V" OCRing as "Y":

Chapter XY -> Chapter XV
Chapter Y -> Chapter V
Chapter XYI -> Chapter XVI
CHAPTER XXIY -> CHAPTER XXIV
CHAPTER XXYI -> CHAPTER XXVI

Punctuation Errors (em dash + hyphen):

—- -> —
-— -> —

You may also want to look out for hyphens followed by a space. This needs to be decided on a case-by-case basis, because many of these are valid. Example, "This is a one- or two-hyphen error." In many cases it is either a badly recognized soft hyphen (end of line or end of page), a speck of dust, or an actual OCR error.

You may also want to make a pass looking for or tags. Sometimes OCR just goes crazy and inserts this into the text.

Golden_Images · 09-16-2016, 04:01 AM

When proofing against a scanned and converted Word doc try bringing up an image only PDF file on half the screen and the word doc of the other. Then slowly go through and check it against the PDF and apply corrections. When you're finished have another pair of eyes do the same thing. That's how we do it. We call it corrective editing.
Stan
www.pdfdocument.com has more information for those who are interested.

AlexBell · 09-18-2016, 01:56 AM

Quote:

Originally Posted by Golden_Images

When proofing against a scanned and converted Word doc try bringing up an image only PDF file on half the screen and the word doc of the other. Then slowly go through and check it against the PDF and apply corrections. When you're finished have another pair of eyes do the same thing. That's how we do it. We call it corrective editing.
Stan
www.pdfdocument.com has more information for those who are interested.

Welcome to the forum, and thanks

Gregg Bell · 09-25-2016, 04:51 PM

I'll second the vote for Balbolka. And I don't know if it was mentioned or not but when the word is spoken aloud the text for that word is also highlighted.

Now I use Linux and there is a similar program to Balbolka named Espeak.

I use the LibreOffice spell checker but I also find it helpful to borrow a Windows computer and use the Word spell checker. (I find that the Word spell (and grammar) checker catches things Libreoffice doesn't like):

John went to the store fro a gallon of milk.

Doitsu · 09-25-2016, 06:37 PM

Quote:

Originally Posted by Gregg Bell

I use the LibreOffice spell checker but I also find it helpful to borrow a Windows computer and use the Word spell checker. (I find that the Word spell (and grammar) checker catches things Libreoffice doesn't like):

By default, LibreOffice comes only with a basic spell checker. You might want to install the LanguageTool extension.

If you check your sample sentence with it, you'll get the following error message:

Quote:

Originally Posted by Gregg Bell

John went to the store fro a gallon of milk.

Did you mean "for" or "from"?

07-10-2016, 11:27 PM	#21
GrannyGrump Obsessively Dedicated... Posts: 3,297 Karma: 36219821 Join Date: May 2011 Location: PA {back in the usa!} Device: Sony PRS-T2, ADE on PC	Well, after all the advanced technical discussions, this post is a bit like a mouse screaming at a lion, but here is a short list of frequent OCR errors I have come across. There are many more I have never noted down, but just fixed on the fly. Maybe more folks can share their "little lists" for the edification of us all. Some of these will be caught with spell-check, but not all, by any means ... OCR VILLAINS: Spoiler: 0 <--> O {zero <--> Uppercase o} 1 l I i ! <--> each other {digit One, lowercase L, uppercase i, lowercase i, exclamation mark} 2 <--> Z 5 <--> S 6 <--> uppercase G 7 <--> ? {question mark} 7 and / = I {uppercase I in italic} e <--> c are <--> arc f ligatures confusion ff, fi, fl, ffi h <--> b back <--> hack harrow <--> barrow H = ll weH = well H or h = li Hbrary = library hke = like hn = lm ahnost = almost j <--> J {lowercase <--> uppercase J } jane = Jane Jury = jury ] = J square bracket = uppercase J ]ane = Jane rn <--> m Mom <--> Morn stem <--> stern earnest = camest {this also had the e=c combo} ri <--> n arid <--> and r = f ringers = fingers m <--> in stein <--> stem rmg = ring inoth = moth im <--> un unport = import imdone = undone n <--> u bnt = but teut = tent uest = nest ii = u iinder = under B <--> R {uppercase} DEABEST = DEAREST Robby <--> Bobby F <--> P {uppercase} Full <--> Pull ih = th feaiher = feather di = th {weird, but it happens a lot} die = the tii = th tiie = the tli = th tlie = the Tm == "I'm (also with no leading quote) T = I {uppercase i} U = double ell, li, il WeU = Well Ufe = life untU = until vv = w vvhen = when \V = W y <--> v yery = very verv = very /' = ," or .” {or single quote} * = quote mark ** ' ' '' = " {two single quotes, should be a double quote} Space following opening quote mark Space preceding closing quote or punctuation mark. He did this ; then he did that ; then he said : “ You aren’t ready ! ” Apostrophe goes missing, stranding the last letter I m = I’m, don t = don’t, Bob s = Bob’s @@@@@@@@@@@@@@@@@@@@@@@@@@ These following often occur with a "Smarten Punctuation" action: Backward quote marks: ” close quote at start of paragraph “ open quote at end of paragraph Reversed single and double quotes in nested quotations: “And I said to him, ‘Quit that!”’ ‘“O what a tangled web we weave,’” she said. ’ Right single quote should replace "straight" apostrophe, not ‘ Left single quote. Happens often at start of a word: ‘em should be ’em, ‘tis should be ’tis

09-25-2016, 04:51 PM	#29
Gregg Bell Gregg Bell Posts: 2,266 Karma: 3917598 Join Date: Jan 2013 Location: Itasca, Illinois Device: Kindle Touch 7, Sony PRS300, Fire HD8 Tablet	I'll second the vote for Balbolka. And I don't know if it was mentioned or not but when the word is spoken aloud the text for that word is also highlighted. Now I use Linux and there is a similar program to Balbolka named Espeak. I use the LibreOffice spell checker but I also find it helpful to borrow a Windows computer and use the Word spell checker. (I find that the Word spell (and grammar) checker catches things Libreoffice doesn't like): John went to the store fro a gallon of milk. Last edited by Gregg Bell; 09-25-2016 at 04:56 PM.

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Tools and methodology for easier proof-reading	Iznogood	Workshop	23	12-05-2016 11:43 AM
ABBYY FineReader - Proof reading tips?	PieOPah	Workshop	23	03-02-2012 02:03 AM
Proof reading: What do you do when you find a clear misprint?	graycyn	Workshop	4	07-20-2011 02:13 PM
Calibre Book Reader for Proof Reading/Editing	Agama	Calibre	16	05-10-2011 06:08 PM
Proof Reading Service	genepool	General Discussions	1	03-16-2011 10:02 AM

07-08-2016, 07:13 PM	#16
BetterRed null operator (he/him) Posts: 22,461 Karma: 31000056 Join Date: Mar 2012 Location: Sydney Australia Device: none	@Doitsu - thanks for the tips re use of Grammar Checker, I was planning on trying it over weekend. Having cut my teeth on Algol and various assemblers, I find it 'interesting' that your sample combines the ultra-verbosity of XML with maxi-terseness of regex. Not your fault of course, its the way of the world as it is - more given to extremes. FWIW - I use the calibre editor Reports to: a) eyeball the Words list filtered on '-' (this morning I found 'them-selves' and 'Wag-nails'), b) scan the Character list for 'odd-ball' characters. The ability to sort the various lists on frequency is helpful, as is the facility to save a list to a csv. BR

07-08-2016, 11:35 PM	#17
AlexBell Wizard Posts: 3,413 Karma: 13369310 Join Date: May 2008 Location: Launceston, Tasmania Device: Sony PRS T3, Kobo Glo, Kindle Touch, iPad, Samsung SB 2 tablet	Thanks for all your responses - I've been so busy proofreading that I haven't been back on the forum. I obviously have a lot to read through and digest. I may need to consider my method of doing ebooks. I don't use Word; I use LibreOffice. I don't use Sigil; I use the CoffeeCup HTML editor I used when I dabbled in website design years ago - though I often use Notepad++ first. CoffeeCup does have a spell checker My usual practice is to find I book I want to do on the Internet Archive or elsewhere, and download the pdf and ePub 'ebook' files. Then I open up the ePub ebook file and edit the HTML files within. Another thing I've learned recently is to take more trouble to find the best possible original file. Some of the pdfs on the Internet Archive are just awful, making it much more likely that I'll read a comma as a period, and vice versa and so on. So where I can I use the HathiTrust version to check against - their version is usually excellent, and they seem to have nearly everything I want. But of course one cannot download the files. Thanks for the suggestions about fonts; I normally use Amasis when proofreading, but will experiment with the other font families and see if that helps.

07-12-2016, 06:26 AM	#24
Jellby frumious Bandersnatch Posts: 7,585 Karma: 21743811 Join Date: Jan 2008 Location: Spaniard in Sweden Device: Cybook Orizon, Kobo Aura	DP has a list of some words that will not be detected by a spell checker, but are most probably OCR errors (scannos), among them the infamous "arid" (for and) and "modem" (for modern): http://www.pgdp.net/c/faq/wordcheck-...ite_word_lists

07-15-2016, 12:58 AM	#25
AlexBell Wizard Posts: 3,413 Karma: 13369310 Join Date: May 2008 Location: Launceston, Tasmania Device: Sony PRS T3, Kobo Glo, Kindle Touch, iPad, Samsung SB 2 tablet	Thanks, Tex2002an, #22. I'm afraid I haven't kept a record. As I remember many of them were , instead of . and vice versa, and I instead of ! and vice versa. But many of them just shouldn't have been there at all. The pdf originals from which the ePub files I used were made were of quite poor quality - though that's no excuse.

09-16-2016, 04:01 AM	#27
Golden_Images Scanning Services Posts: 2 Karma: 10 Join Date: May 2014 Location: Missouri Device: multiple	When proofing against a scanned and converted Word doc try bringing up an image only PDF file on half the screen and the word doc of the other. Then slowly go through and check it against the PDF and apply corrections. When you're finished have another pair of eyes do the same thing. That's how we do it. We call it corrective editing. Stan www.pdfdocument.com has more information for those who are interested.