MobileRead Forums

MobileRead Forums (https://www.mobileread.com/forums/index.php)
-   Plugins (https://www.mobileread.com/forums/forumdisplay.php?f=268)
-   -   Spelling dictionary and plugins (https://www.mobileread.com/forums/showthread.php?t=263159)

CalibUser 07-22-2015 10:16 AM

Spelling dictionary and plugins
 
I need to access the spelling dictionaries (default and user ones) in a plugin that I intend to produce.

How can I access the words in these dictionaries in a plugin?

Thanks.

KevinH 07-22-2015 10:36 AM

Hi,

Is it just the user words and Hunspell dictionaries (.affl, .dic) files you want?
If so, you could read and parse them by a python script from the user's Preferences location (or shared dictionary location on Linux).

Or do you want to do actual spell checking?

If you want to do actual spell-checking, that would be much more difficult.
The plugin interface simply passes information about the currently open book location to a python environment and so is not a bi-directional call interface.

The python environment can manipulate the files and then creates an XML response telling the Sigil C++ environment which files it needs to copy or change.

Spell checking inside the Sigil app is done via a HunSpell interface. The easiest way for you to use HunSpell inside a python plugin would be to either find a Hunspell Python interface package (see the Python Package Index) (or a pure python spell-checker and then get permission to include it in your plugin OR use python "ctypes" calls to access the HunSpell dynamic library version that comes with Sigil.

KevinH

Quote:

Originally Posted by CalibUser (Post 3138265)
I need to access the spelling dictionaries (default and user ones) in a plugin that I intend to produce.

How can I access the words in these dictionaries in a plugin?

Thanks.


CalibUser 07-22-2015 01:38 PM

Thanks for your response. I want to write plugin that will remove hyphens from words that should not be hyphenated. To do this the plugin will scan through the epub and when it finds a hyphenated word it will remove the hyphen and then see if the word without the hyphen exists in the dictionary. If it does, then it will remove the hyphen from the word in the ePub.

I've decided to use the python "ctypes" calls to do this job and I found a helpful site at http://thispageintentionally.blogspo...-hunspell.html.

This site uses the following code to load the library:

<code>
import os
# set up path strings to a dictionary
dpath = '/Users/dcortes1/Desktop/scratch'
daff = os.path.join(dpath, 'en_US.aff')
ddic = os.path.join(dpath, 'en_US.dic')
print( os.access(daff,os.R_OK), os.access(ddic,os.R_OK) )
# Find the library -- I know it is in /usr/local/lib but let's use
# the platform-independent way.
import ctypes.util as CU
libpath = CU.find_library( 'hunspell-1.3.0' )
# Get an object that represents the library
import ctypes as C
hunlib = C.CDLL( libpath )
</code>

Unfortunately this produces the error when I run it in a Sigil plugin:

Error: bad argument type for built-in operation

Where is the error in this code?

Thanks

eschwartz 07-22-2015 02:44 PM

FWIW, calibre's plugin environment is a lot more closely bound to the editor, since they both share the same python environment. In calibre, editor plugins can directly call the spellchecker.

In fact, you don't even need a plugin -- see: The power of function mode - using a spelling dictionary to fix mis-hyphenated words.

KevinH 07-22-2015 03:04 PM

Hi,
Is this with python 2.7 or python 3.4? There are changes to ctypes code needed to make it work with python 3.4 strings. See further along in your reference link as an example.

I would debug this code (the full code from your example) outside of the plugin environment by getting it to work in a straight python program first, using the exact same python version 2.7 or 3.4 you want the plugin to work under. Be careful as some linux systems now install python3 as just python and have renamed the old python to python2.

Running python at the terminal prompt should tell you which is found first in your path and what version it is.

Post you full standalone example here and I will test it onmy MacOS machine to see exactly what error you are getting.

Just make sure you are using the correct python version your code expects.

Quote:

Originally Posted by CalibUser (Post 3138372)
Thanks for your response. I want to write plugin that will remove hyphens from words that should not be hyphenated. To do this the plugin will scan through the epub and when it finds a hyphenated word it will remove the hyphen and then see if the word without the hyphen exists in the dictionary. If it does, then it will remove the hyphen from the word in the ePub.

I've decided to use the python "ctypes" calls to do this job and I found a helpful site at http://thispageintentionally.blogspo...-hunspell.html.

This site uses the following code to load the library:

<code>
import os
# set up path strings to a dictionary
dpath = '/Users/dcortes1/Desktop/scratch'
daff = os.path.join(dpath, 'en_US.aff')
ddic = os.path.join(dpath, 'en_US.dic')
print( os.access(daff,os.R_OK), os.access(ddic,os.R_OK) )
# Find the library -- I know it is in /usr/local/lib but let's use
# the platform-independent way.
import ctypes.util as CU
libpath = CU.find_library( 'hunspell-1.3.0' )
# Get an object that represents the library
import ctypes as C
hunlib = C.CDLL( libpath )
</code>

Unfortunately this produces the error when I run it in a Sigil plugin:

Error: bad argument type for built-in operation

Where is the error in this code?

Thanks


DiapDealer 07-22-2015 04:45 PM

In addition to KevinH's advice, keep in mind that there are some platform-specific differences to the find_library and CDLL functions.

On OSX and Linux, find_library(name) expects the name parameter to be without a 'lib' prefix and without any suffixes like '.so' or '.dylib', or any appended version numbers. Windows has no shared library prefix, so if I recall, you'd need to use the whole filename minus the extension there.

In addition, find_library will return the full path to the shared library (if found) on Windows and OSX, while Linux will only return the file name portion.

So in your *nix example: "find_library('hunspell-1.3.0')" will likely be looking for a shared library with the name 'libhunspell-1.3.0.so.x' if there is no such library on your system (where your system keeps its shared libraries), it's going to return None.--and pass None to CDLL. Are you sure that's the exact version of hunspell's shared library that you have installed on your system?

Doitsu 07-22-2015 06:58 PM

@DiapDealer: This may be a stupid question: where does the Sigil Windows installer install the Windows Hunspell dll that Sigil uses? (It's not in the same folder as the other .dll files.)

KevinH 07-22-2015 07:07 PM

Hi Doitsu,
It may be statically linked into the Sigil executable. I have not actually looked to check.
KevinH

Quote:

Originally Posted by Doitsu (Post 3138494)
@DiapDealer: This may be a stupid question: where does the Sigil Windows installer install the Windows Hunspell dll that Sigil uses? (It's not in the same folder as the other .dll files.)


BetterRed 07-22-2015 07:20 PM

whoops - wrong thread

DiapDealer 07-22-2015 07:38 PM

Hunspell is built and statically linked into the Sigil executable on Windows. It's the same on Linux and OSX unless a "use local libs" switch is used at build time. Same with PCRE.

Doitsu 07-23-2015 03:00 AM

Quote:

Originally Posted by DiapDealer (Post 3138521)
Hunspell is built and statically linked into the Sigil executable on Windows. It's the same on Linux and OSX unless a "use local libs" switch is used at build time. Same with PCRE.

This means that the OP would either need to bundle the Hunspell dll with the plugin or only use the custom dictionary.

@CalibUser: The user dictionary is a simple text file. For more information on its usage also see this related thread.
You may want to check out Beautiful Soup, which you can embed in a plugin. For a simple example, see this throwaway plugin.
For algorithm ideas also check out this Epub spell checker, which appears to be using lexicostatistics to identify OCR errors.

CalibUser 07-23-2015 08:00 AM

Thanks for all your replies.

It was the function in Callibre for removing unwanted hyphens that made me decide to write one as a plugin for Sigil, partly as an exercise for learning Python and partly because I thought it would be a useful plugin for others to download from this site.

However, I'm wondering if I am being too ambitious as I have only just started to learn Python!

Following on from DiapDealer's first post I tried to find the library for Hunspell using Windows search and it could not find it. When I saw the second post stating that 'Hunspell is built and statically linked into the Sigil executable on Windows' I realised that my code was looking for a library that doesn't exist, so I won't post my non-functional code here. I will need to explore the suggestions made by Doitsu ie to bundle the Hunspell dll with the plugin or only use the custom dictionary. Thanks for the links, Doitsu. I will follow them up.

BTW - What is OP?

Doitsu 07-23-2015 08:36 AM

Quote:

Originally Posted by CalibUser (Post 3138815)
BTW - What is OP?

OP = Original Poster i.e. you. :)

Quote:

Originally Posted by CalibUser (Post 3138815)
However, I'm wondering if I am being too ambitious as I have only just started to learn Python!

There is no such thing as being too ambitious. :) The good thing about Python is that there are a gazillion of ready-made libraries that you only need to import. (It's almost like using Lego bricks.)

And in the rare case that no ready-made library exists, usually DiapDealer and KevinH will come up with some helpful ideas. (I couldn't have finished my very simple plugins without their help.)

CalibUser 07-23-2015 08:40 AM

Thanks for your encouragement, Doitsu. I will definitively explore the suggestions posted here.

KevinH 07-23-2015 09:03 AM

Hi,
That said ... if spell checking inside a plugin is important, we can change the Sigil build to use a dynamic hunspell lib instead of a static one. Alternatively we could include our own dll with hooks back into needed routines such as spellchecking.

I will look into doing this for a future release.

KevinH


All times are GMT -4. The time now is 08:30 PM.

Powered by: vBulletin
Copyright ©2000 - 3.8.5, Jelsoft Enterprises Ltd.
MobileRead.com is a privately owned, operated and funded community.