View Single Post
Old 01-21-2023, 08:30 PM   #3
isarl
Addict
isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.isarl ought to be getting tired of karma fortunes by now.
 
Posts: 287
Karma: 2534928
Join Date: Nov 2022
Location: Canada
Device: Kobo Aura 2
Are you expecting to find many French words together, or are you looking for loanwords? It seems like loanword detection is an open research problem, but if you're up for writing a bit of code, I found a few options for doing general language detection:

If you are comfortable with Python, then langdetect (a port of this Java library, if you prefer Java); a similar option implemented as part of the spaCy NLP framework, spacy-langdetect; and textblob (which appears to farm out the language detection to the Google Translate API).

Langdetect seems nice and simple, but you still need to figure out how to walk over the words in your book, so spaCy might be a better choice for that, as it comes with sentence segmentation.

Good luck!

Last edited by isarl; 01-21-2023 at 08:33 PM. Reason: added mention of sentence segmentation
isarl is offline   Reply With Quote