View Single Post
Old 02-20-2019, 03:05 AM   #29
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,306
Karma: 13057279
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by phossler View Post
Added - actually, the document language is English, so spell check flags the foreign words. From the spell check error report, I can copy the word to the saved search and do a replace all. Removes it from spell check error since I don't like to Ignore or Add To Dictionary
This is exactly how I would handle it.

Ever since Calibre added Multi-Language Spellcheck, you can easily mark the words with lang + xml:lang.

Example sentence:

Code:
I ate some espaņol sofritos today.
1. Use Calibre's Tools > Check Spelling with Show only misspelled words checked.

Most foreign words should pop up as misspelled. "sofritos" would stick out like a sore thumb.

Use Change selected word to and replace it with something like "@sofritos@".

Note: The very last word in the list is the word itself, so just click on that and make your adjustments:

Click image for larger version

Name:	CalibreMultiLanguageSpellcheck-Step1.png
Views:	234
Size:	17.3 KB
ID:	169817

2. After the end of the first pass, do a mass Search/Replace:

Search: @(.+?)@
Replace: <i lang="es" xml:lang="es">\1</i>

Code:
I ate some espaņol <i lang="es" xml:lang="es">sofritos</i> today.
3. Run Spellcheck List again, and repeat Step 1.

You'll easily be able to see which Spanish words you've caught so far, and narrow the list down further:

Click image for larger version

Name:	CalibreMultiLanguageSpellcheck-Step3.png
Views:	232
Size:	20.0 KB
ID:	169818

4. Now uncheck Show only misspelled words, and do a few more passes. That should get you most of the way there.

5. To attach most "phrases" (which are made up of just individual foreign words)... search for two Spanish italics next to each other:

Search: (<i lang="es" xml:lang="es">.+?)</i> <i lang="es" xml:lang="es">
Replace: \1

and it'll merge them:

Code:
I ate some <i lang="es" xml:lang="es">espaņol sofritos</i> today.
That should carry you most of the way there.

Quote:
Originally Posted by phossler View Post
Not sure what the actual rules are for marking/tagging foreign text is, but that'll be a fun research project for another day
Just wondering, why exactly are you marking all foreign words in italics? Are you trying to enforce a Style Guide (CMOS?) or something along those lines?

Last edited by Tex2002ans; 02-20-2019 at 04:26 AM.
Tex2002ans is offline   Reply With Quote