MobileRead Forums

MobileRead Forums (https://www.mobileread.com/forums/index.php)
-   Editor (https://www.mobileread.com/forums/forumdisplay.php?f=262)
-   -   Spell checking multiple languages (https://www.mobileread.com/forums/showthread.php?t=289353)

phossler 08-06-2017 10:13 AM

Spell checking multiple languages
 
https://manual.calibre-ebook.com/edi...ds-in-the-book

The example in the above reference shows how to use lang=" " to mark a word as 'not book default' so that spell checker knows to us another dictionary (or so I assume it works that way):

Code:

<div lang="en_US">color <span lang="en_GB">colour</span></div>
I have a epub right now that has a large amount of foreign dialog and it will be very time consuming to mark all the passages

I could go through and add then to my own dictionary, but that's not really spell checking.

Is there anyway to select 2 or more languages for the entire book?

Is there another way to do it other than one at a time?

Divingduck 08-06-2017 12:30 PM

1 Attachment(s)
There are different scenarios:
Sometimes you have books with explicit defined languages for a word, a paragraph, a file or book wide.

You can add a additional language to the spellchecker. These will be used for all explicit signalized words, paragraphs and so on (e.g. like in your exampe).

Usually there is only one language defined in a book. For this cases I use user defined dictionaries (mostly two: one for foreign language words and one for special word constructs used in the actual book). The user defined dictionaries can set to an active / non active status so that you have all freedom to use is like you want.

The only thing what is missing is the possibility to deactivate one of the main dictionary in case you use more then one dictionary. I thought this was implemented years ago when the spellchecker was implemented but I can't find it again, maybe I remember wrong and had only ask for it :(

For this I use a little trick to work first with the foreign language as false positive and copy all correct identified words in a new user dictionary for a foreign language and switch then back to to major language including the new user dictionary as additional dictionary.
Take a look to the section Import word lists. This is very helpful to manage huge word lists. It have the possibility to add a language identifier to a list or a word too. You can copy containing words of a user dictionary to clipboard to create your own sets of useful combinations :)

kovidgoyal 08-06-2017 01:54 PM

The correct way to do it is to mark text in different languages. That allows people reading the book in the future to, for example, lookup words in the dictionary of the marked language while reading.

However, if you want to automate just spell cehcking you can do so using a search replace function mode function (but you need to be able to program a bit for that).

The idea would be similar to https://manual.calibre-ebook.com/fun...phenated-words

Here when the word is not recognized by the main dictionary you wrap it ina <span lang="secondary language">word</span>

Then re-run spell check. Now words reported misspelled will have failed to match in both languages. Fix all the words that need fixing and then when you are done, run a search and replace to remove the inserted span tags.

phossler 08-06-2017 07:46 PM

1 Attachment(s)
@DD -- thanks for possible workaround. I'll think on it

@kovid -- Agree that this is the right way to do it

Quote:

<p>I lived in the seventeenth <span lang="fr">arrondissement</span>. The modernization project that had swept up the<span lang="fr"> Avenue Neuilly</span> and was extending the smart side of Paris to the west had by-passed the dingy <span lang="fr">Quartier des Ternes</span>. I walked as far as the <span lang="fr">Avenue de la Grande Armee</span>. The Arc was astraddle the <span lang="fr">Etoile</span> and the traffic was desperate to get there. Thousands of red lights twinkled like bloodshot stars in the warm mist of the exhaust fumes. It was a fine Paris evening, Gauloises and garlic sat lightly on the air, ...</p>

but lot of manual effort, and I'm not up to writing my own search/replace RE function

For a few foreign words, I could use [Insert Tag]



Side note about possible User Manual problem with [Insert Tag]

https://manual.calibre-ebook.com/edit.html

I created a <span lang="fr"> tag, but it took a while to remember how I had created the tags I currently have. There doesn't seem to be any information in the link about creating / inserting a tag.

HarryT 08-07-2017 04:39 AM

Quote:

Originally Posted by phossler (Post 3564376)
I created a <span lang="fr"> tag, but it took a while to remember how I had created the tags I currently have. There doesn't seem to be any information in the link about creating / inserting a tag.

Just type it - tags are just text. The editor will automatically insert the close tag for you as soon as you type the open tag.

Divingduck 08-07-2017 06:12 AM

Quote:

Originally Posted by phossler (Post 3564376)
but lot of manual effort

This is correct and the problem is, that you can't automate this for languages in a perfect way. This is for my native language more or less the same situation. Depending on the complexity of the source text I use for this cases a other way around. I export/open the book text in MS-Word as DOCX and let it make the job of declaring the language part. You need first to add the needed spellchecker in word. It is not perfect but good enough to go forward. The tail of the coin is, that you loose the original document structure when you need to make a conversion but this is, compare to the time it takes to correct this part, only a minor item of the bill.

phossler 08-07-2017 12:20 PM

Quote:

Originally Posted by HarryT (Post 3564499)
Just type it - tags are just text. The editor will automatically insert the close tag for you as soon as you type the open tag.

I find it easier to selected the foreign text and then just click [Insert Tag] to bracket the text with <span lang="fr"> ..... </span>

Also, I've found that the editor will only complete the closing tag when I type </

phossler 08-07-2017 12:25 PM

Quote:

Originally Posted by Divingduck (Post 3564516)
This is correct and the problem is, that you can't automate this for languages in a perfect way.

Interesting approach, and I think that'd work when I start with very raw text.

Since I spend most of my time 'fixing' an epub to be more readable on my kindle, it'd have to be a judgement call each time

kovidgoyal 08-07-2017 01:46 PM

@phossler: Not sure what you mean. You just lick the insert tag button and it asks you to input the tag you want. I dont know how it could be more straightforward than that.

phossler 08-07-2017 08:19 PM

1 Attachment(s)
Yes, the whole process is very easy to pick a tag and then insert it around the around selected text (attachment)

When I said

Quote:

Also, I've found that the editor will only complete the closing tag when I type </
I was referring to manually inserting a tag that Calibre auto-closes for me when I tell it where the scope ends

e.g. After I insert the <b> ....

Quote:

text text text <b> text text text text text text
.... when I add the 'closing tag' start characters </ ....

Quote:

text text text <b> text text text text text text</
.... Calibre auto completes it for me

Quote:

text text text <b> text text text text text text</b>

All very nice and user friendly

theducks 08-07-2017 09:10 PM

It would be wonderful if in the spell check, you could simply change the language column value and have it replace (all) with the appropriate span tag, even if you had to click the replace button after changing (forcing) the language.

kovidgoyal 08-08-2017 12:07 AM

@theducks: I dont think wrapping span tags around single words is a good idea, that will create lots of markup bloat.

What is really needed is a language markup tool/plugin. You give it a list of dictionaries, then it goes through the book and matches words against all the dictionaries in the list. Every contiguous series of words matched to the same dictionary then get wrapped in span tag with the correct language.

That is basically how the language detect tool in word works, I imagine.

theducks 08-08-2017 12:49 AM

Quote:

Originally Posted by kovidgoyal (Post 3564856)
@theducks: I dont think wrapping span tags around single words is a good idea, that will create lots of markup bloat.

What is really needed is a language markup tool/plugin. You give it a list of dictionaries, then it goes through the book and matches words against all the dictionaries in the list. Every contiguous series of words matched to the same dictionary then get wrapped in span tag with the correct language.

That is basically how the language detect tool in word works, I imagine.

I guess the only reason for the markup, is spell checking. A number of 'other' language words are in common usage in American literature (many have to do with eating :D ) and the spell check flags them as wrong in en-US...but they may also be spelled wrong in their original language, but without language sensitivity...

phossler 08-08-2017 08:17 AM

@DD -- I am certainly not an expert, but it seems there are multiple reasons why someone would want to make css language-aware

http://www.w3.org/International/questions/qa-lang-why

@KG - agree about the possible markup bloat, so the contagious word <span> would be the way to go. My example in #10 has 4 French words selected to be <span>-ed

BetterRed 08-08-2017 09:03 AM

Quote:

Originally Posted by theducks (Post 3564861)
I guess the only reason for the markup, is spell checking.

Nuh - main reason is so the reader can lookup the word in an appropriate dictionary. You know the ones that give you the meaning of word, examples of use, and bit of etymology if you're lucky, and translation if it's that sort of dictionary.

A spell checker might know floccinaucinihilipilification is correctly spelt, but will the reader know what it means :rofl:

BR


All times are GMT -4. The time now is 10:50 PM.

Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.