Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Formats > ePub

Notices

Reply
 
Thread Tools Search this Thread
Old 01-21-2023, 07:43 PM   #1
enuddleyarbl
Guru
enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.
 
enuddleyarbl's Avatar
 
Posts: 734
Karma: 1077122
Join Date: Sep 2013
Device: Kobo Forma
Search for All French Words in English Book?

Does anyone know of a reasonable way to search an english epub for French words? Right now, I'm working on some of Christie's "Poirot" novels and it would be nice to span those little phrases with lang="fr". I'm sure I can catch most of them by searching for some of Poirot's more common French blurbs. But, I was wondering if there was some more sure-fire way to do it.

EDIT: I should have thought of this earlier. If I'm lucky, either the publisher or Calibre will have enclosed the French stuff in italics. I'll search for <i> and, where appropriate, replace it with <i lang="fr">.

Last edited by enuddleyarbl; 01-21-2023 at 07:51 PM.
enuddleyarbl is offline   Reply With Quote
Old 01-21-2023, 07:56 PM   #2
Karellen
Wizard
Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.
 
Karellen's Avatar
 
Posts: 1,107
Karma: 4911876
Join Date: Sep 2021
Location: Australia
Device: Kobo Libra 2
Probably better to use...

PHP Code:
<i lang="fr" xml:lang="fr"
As for searching, a lot of foreign words are usually marked as misspelled, so you can alt-f7 for the Check Spelling pop-up that will list all misspelled words. Then double click on the word to jump to them.
Karellen is online now   Reply With Quote
Advert
Old 01-21-2023, 08:30 PM   #3
isarl
Addict
isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.
 
Posts: 287
Karma: 2534928
Join Date: Nov 2022
Location: Canada
Device: Kobo Aura 2
Are you expecting to find many French words together, or are you looking for loanwords? It seems like loanword detection is an open research problem, but if you're up for writing a bit of code, I found a few options for doing general language detection:

If you are comfortable with Python, then langdetect (a port of this Java library, if you prefer Java); a similar option implemented as part of the spaCy NLP framework, spacy-langdetect; and textblob (which appears to farm out the language detection to the Google Translate API).

Langdetect seems nice and simple, but you still need to figure out how to walk over the words in your book, so spaCy might be a better choice for that, as it comes with sentence segmentation.

Good luck!

Last edited by isarl; 01-21-2023 at 08:33 PM. Reason: added mention of sentence segmentation
isarl is offline   Reply With Quote
Old 01-21-2023, 10:41 PM   #4
enuddleyarbl
Guru
enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.
 
enuddleyarbl's Avatar
 
Posts: 734
Karma: 1077122
Join Date: Sep 2013
Device: Kobo Forma
Quote:
Originally Posted by Karellen View Post
Probably better to use...

PHP Code:
<i lang="fr" xml:lang="fr"
As for searching, a lot of foreign words are usually marked as misspelled, so you can alt-f7 for the Check Spelling pop-up that will list all misspelled words. Then double click on the word to jump to them.
Good point. I'm using both lang="en-gb" and xml:lang="en-gb" in the <html> section of the files (since it's a Christie book), but I forgot to do the same with french in the <i> tags. Thanks.

Unfortunately, I'd previously accepted a lot of those french words as OK in the spellchecker, so, unless I clear that dictionary, I'd be missing them. Plus, being able to already have the <i> selected makes that much easier (though searching through all the false positives does take some time.

#isarl: those tools are good ideas, but much more work than I'm willing to do. Thanks for the suggestions, though.
enuddleyarbl is offline   Reply With Quote
Old 01-21-2023, 11:47 PM   #5
Karellen
Wizard
Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.Karellen ought to be getting tired of karma fortunes by now.
 
Karellen's Avatar
 
Posts: 1,107
Karma: 4911876
Join Date: Sep 2021
Location: Australia
Device: Kobo Libra 2
Quote:
Originally Posted by enuddleyarbl View Post
Unfortunately, I'd previously accepted a lot of those french words as OK in the spellchecker, so, unless I clear that dictionary, I'd be missing them.
I added a second dictionary for exactly this.
I leave the main reference dictionary untouched. Instead I add to the "temp" dictionary. Every now and then I delete the temp dictionary and start again.
Attached Thumbnails
Click image for larger version

Name:	dictionaries.jpg
Views:	80
Size:	128.1 KB
ID:	199177  
Karellen is online now   Reply With Quote
Advert
Old 01-22-2023, 01:52 PM   #6
enuddleyarbl
Guru
enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.
 
enuddleyarbl's Avatar
 
Posts: 734
Karma: 1077122
Join Date: Sep 2013
Device: Kobo Forma
On a quasi-related note, is there some construct to translate little blurbs of text for the reader? If these were full-fledged untranslated paragraphs of French, I'd run it through Google Translate and stick the result in a footnote. But, for these little Poirot exclamations, most of them are trivial and only some use words I don't recognize.

I was thinking of commandeering <abbr title="...translation...">french phrase</abbr>, but though it works in Calibre, it doesn't as a kepub on my Forma. A pure <aside>...translation...</aside> just breaks the paragraph and dumps it right there. <ruby>french phrase<rt>the translation</rt></ruby> looked interesting, but it puts the individual translated words over their corresponding untranslated words (instead of just putting the whole translation over the untranslated phrase).

Right now, the best I can come up with is a full-fledged footnote to properly set the translation off from the paragraph.
enuddleyarbl is offline   Reply With Quote
Old 01-22-2023, 03:18 PM   #7
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by enuddleyarbl View Post
Does anyone know of a reasonable way to search an english epub for French words? [...] it would be nice to span those little phrases with lang="fr".
Yes, I wrote a tutorial on this back in:

I described how to use Sigil's/Calibre's Spellcheck Lists in order to tag each Spanish/"foreign word" with an HTML language.

You could then use some regex to merge everything together.

- - -

I even wrote another tutorial showing you how you can use 2 dictionaries to quickly spot "foreign words" too:

In that case, I used the trick to quickly find all British<->American spellings.

Quote:
Originally Posted by enuddleyarbl View Post
EDIT: I should have thought of this earlier. If I'm lucky, either the publisher or Calibre will have enclosed the French stuff in italics. I'll search for <i> and, where appropriate, replace it with <i lang="fr">.
Use:

Code:
<i lang="fr" xml:lang="fr">
personally, I also add a class there too:

Code:
<i class="french" lang="fr" xml:lang="fr">
to make it easier to manipulate via CSS.

Then if you want all your French words to be red? Very simple to understand CSS:

Code:
.french {
   color: red;
}
- - -

Side Note: If you want even more on proper HTML language markup, type this into your favorite search engine:

Code:
xml:lang Tex2002ans site:mobileread.com
I've written more than 100 times about all this in ebooks!

- - -

Quote:
Originally Posted by enuddleyarbl View Post
On a quasi-related note, is there some construct to translate little blurbs of text for the reader? If these were full-fledged untranslated paragraphs of French, I'd run it through Google Translate and stick the result in a footnote. But, for these little Poirot exclamations, most of them are trivial and only some use words I don't recognize.
Leave it up to the reader/app itself. For example:
  • PocketBook

allows you to Auto-Translate text, inline, similar to Google Translate on a webpage.

You could also press+hold, then send the highlighted text to a translation site too.

(In PocketBook, you can also choose which engine you want to use, like DeepL, Google Translate, Bing Translate, etc.)

Quote:
Originally Posted by enuddleyarbl View Post
I was thinking of commandeering <abbr title="...translation...">french phrase</abbr>, but though it works in Calibre, it doesn't as a kepub on my Forma.

[...]

Right now, the best I can come up with is a full-fledged footnote to properly set the translation off from the paragraph.
If it's only for your personal usage, then you could do a footnote.

But if it's an ebook for actual sale, DO NOT use those hackish <abbr> or <ruby> methods.

If you device doesn't have the Auto-Translate stuff, you could also do something like shoving the translation right after + in a different font:

Code:
« Je parle français! » (I speak French!)
would be this HTML:

Code:
<span class="french" lang="fr" xml:lang="fr">« Je parle français! »</span> <span class="translated">(I speak French!)</span>
This would allow you to easily tweak your manually translated text, so you could do something like:

Code:
span.translated {
    font-weight: bold;
}
Quote:
Originally Posted by enuddleyarbl View Post
But, for these little Poirot exclamations, most of them are trivial and only some use words I don't recognize.
In that case, you may be able to gather from context. If not, then I just treat it like "unknown babble" or completely made-up fantasy words.

Similar to when I run across Greek or Japanese or Chinese in my books. I just nod my head... then continue reading. (But not until I properly tag the language, of course!!!)

Last edited by Tex2002ans; 01-22-2023 at 03:48 PM.
Tex2002ans is offline   Reply With Quote
Old 01-22-2023, 05:01 PM   #8
enuddleyarbl
Guru
enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.enuddleyarbl ought to be getting tired of karma fortunes by now.
 
enuddleyarbl's Avatar
 
Posts: 734
Karma: 1077122
Join Date: Sep 2013
Device: Kobo Forma
Code:
<i class="french" lang="fr" xml:lang="fr">
If I'm reading the various references I've found, it looks like you might not need that class in the <i> tags. You could use the :lang pseudo-class to match the languages:

https://developer.mozilla.org/en-US/docs/Web/CSS/:lang

The CSS syntax is:
Code:
:lang(languagecode) {
  css declarations;
}
enuddleyarbl is offline   Reply With Quote
Old 01-22-2023, 08:01 PM   #9
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Yes, but .french = a simple CSS selector.

The :lang selector is much more advanced CSS3/CSS4... and older renderers might not be able to handle that.

- - -

It also helps make the code much more human-readable.

Would you know what:

Code:
lang="et"
stands for off the top of your head? Nope.

But if you saw:

Code:
class="estonian" lang="et"
now that makes more sense!

Last edited by Tex2002ans; 01-22-2023 at 08:05 PM.
Tex2002ans is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Glo Provenance of French and French-English dictionaries hans_n Kobo Reader 7 02-06-2016 03:05 AM
English words of 'recent' origin pdurrant Lounge 19 05-27-2014 07:50 AM
Best eBook reader for reading French (English speaker learning French) eVeNtInE Which one should I buy? 13 08-24-2012 04:25 AM
Touch Dictionary only looks up for English words frankieGom Kobo Reader 6 12-09-2011 02:52 PM
Search for a good English<-> French dictionary Cantrill Amazon Kindle 19 08-19-2011 09:52 AM


All times are GMT -4. The time now is 04:00 AM.


MobileRead.com is a privately owned, operated and funded community.