Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Sigil

Notices

Reply
 
Thread Tools Search this Thread
Old 07-08-2019, 06:19 AM   #1
elchamaco
Zealot
elchamaco will become famous soon enoughelchamaco will become famous soon enoughelchamaco will become famous soon enoughelchamaco will become famous soon enoughelchamaco will become famous soon enoughelchamaco will become famous soon enough
 
Posts: 128
Karma: 500
Join Date: Aug 2011
Device: kindle, boox
Export list of words in spellcheck

It'll be great an export list of words in spellcheck tool.

elchamaco is offline   Reply With Quote
Old 07-08-2019, 07:55 AM   #2
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,094
Karma: 5450184
Join Date: Nov 2009
Device: many
You can already add words to your own wordlist. So what purpose would this serve?
Why would this feature be useful to the majority of users? Please explain.
KevinH is offline   Reply With Quote
Advert
Old 07-08-2019, 08:06 AM   #3
Doitsu
Grand Sorcerer
Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.Doitsu ought to be getting tired of karma fortunes by now.
 
Doitsu's Avatar
 
Posts: 5,635
Karma: 23191067
Join Date: Dec 2010
Device: Kindle PW2
Quote:
Originally Posted by elchamaco View Post
It'll be great an export list of words in spellcheck tool.
This functionality is already present:

1. Select Edit > Preferences > Open Preferences Location.
2. Double-click the user_dictionaries folder and create a blank text file, e.g. words.txt.
3. Select Tools > Spellcheck > Spellcheck...
4. Select words.txt under Add To Dictionary:.
5. Highlight all words in the table and click Add To Dictionary:.
Doitsu is offline   Reply With Quote
Old 07-08-2019, 11:07 AM   #4
elchamaco
Zealot
elchamaco will become famous soon enoughelchamaco will become famous soon enoughelchamaco will become famous soon enoughelchamaco will become famous soon enoughelchamaco will become famous soon enoughelchamaco will become famous soon enough
 
Posts: 128
Karma: 500
Join Date: Aug 2011
Device: kindle, boox
Quote:
Originally Posted by Doitsu View Post
This functionality is already present:

1. Select Edit > Preferences > Open Preferences Location.
2. Double-click the user_dictionaries folder and create a blank text file, e.g. words.txt.
3. Select Tools > Spellcheck > Spellcheck...
4. Select words.txt under Add To Dictionary:.
5. Highlight all words in the table and click Add To Dictionary:.
This serves me
elchamaco is offline   Reply With Quote
Old 07-08-2019, 12:26 PM   #5
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by KevinH View Post
You can already add words to your own wordlist. So what purpose would this serve?

Why would this feature be useful to the majority of users? Please explain.
Ahh, I thought he meant being able to export a list of words as CSV using Tools > Spellcheck > Spellcheck (F7).

Calibre allows you to do this indirectly if you do Tools > Reports > Words and press the Save button.

Being able to save as CSV is very helpful when wanting to work with outside tools. I sometimes use Calibre's list to be able to manipulate the CSV with LibreOffice Calc.
Tex2002ans is offline   Reply With Quote
Advert
Old 07-08-2019, 12:58 PM   #6
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,094
Karma: 5450184
Join Date: Nov 2009
Device: many
BTW, Calc like excel will parse most text files if delimited in some way (need not be commas and quotes) or if field aligned.
KevinH is offline   Reply With Quote
Old 07-08-2019, 07:02 PM   #7
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,933
Karma: 27620688
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Just a thought - one that's been in my head ever since Kovid developed his editor and this one was bought back to life.

Merge Sigil and Calibre Editor's Reports into a single tool usable from either program via wrapper plugins - bit like DiapDealers' toolbox. It's 'annoying' having to dodge between them to get a report the other one offers - it also opens up the dangers of running two editors against the same source - concurrently.

FTR: 1. Of the differences between the two editors this is only one that 'annoys' me. 2. I put a high value on being able to jump from a Report line into the code, so, I'd wouldn't want lose that in a combined tool.

BR

Last edited by BetterRed; 07-08-2019 at 07:04 PM.
BetterRed is offline   Reply With Quote
Old 07-08-2019, 07:31 PM   #8
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,901
Karma: 198500000
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
That's not likely to happen. Not because of any kind of rivalry or animosity--far from it actually. But because it's just a bad, bad idea. Too much potential for a We-Break-His-App-He-Breaks-Our-App kind of thing. Not to mention the problems with making two separate projects prerequisites for each other. Kovid's got his own system and he moves fast, he's not going to want to take time to check with us before he makes changes to his code that might be being used by Sigil Reports (and vice versa).

Somebody might be able to write a Sigil plugin that utilizes calibre's python modules if they're available (and that's a BIG "might" considering python modules compiled with different versions of python), but it would never be able to interact with Sigil's built-in Reports feature. And it would always be a fragile thing that could break at any moment (through no fault of the plugin dev).

My toolbox works because all of the tools are mine. There's a dependency on calibre's plugin framework, but that's the same for all calibre plugins: calibre plugin/calibre plugin framework ... Sigil plugin/Sigil plugin framework. And ne'er (or at least very unlikely) the twain shall meet.

Last edited by DiapDealer; 07-08-2019 at 07:35 PM.
DiapDealer is offline   Reply With Quote
Old 07-08-2019, 11:12 PM   #9
BetterRed
null operator (he/him)
BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.BetterRed ought to be getting tired of karma fortunes by now.
 
Posts: 20,933
Karma: 27620688
Join Date: Mar 2012
Location: Sydney Australia
Device: none
Quote:
Originally Posted by DiapDealer View Post
That's not likely to happen. <snip>
I never really thought it would, that's why I've never mentioned it

My first thought was something more like BV being replaced by PageEdit, i.e the Report features of both would be replaced by something external (ePubReports) - but I imagine getting from a report line to the matching code might be a challenge - hence my FTR #2.

If Kovid had an Open with in the calibre editor could the PageEdit gadget be used from within it?

BR

Last edited by BetterRed; 07-09-2019 at 03:39 AM.
BetterRed is offline   Reply With Quote
Old 07-09-2019, 03:38 AM   #10
elchamaco
Zealot
elchamaco will become famous soon enoughelchamaco will become famous soon enoughelchamaco will become famous soon enoughelchamaco will become famous soon enoughelchamaco will become famous soon enoughelchamaco will become famous soon enough
 
Posts: 128
Karma: 500
Join Date: Aug 2011
Device: kindle, boox
Quote:
Originally Posted by Tex2002ans View Post
Ahh, I thought he meant being able to export a list of words as CSV using Tools > Spellcheck > Spellcheck (F7).

Calibre allows you to do this indirectly if you do Tools > Reports > Words and press the Save button.

Being able to save as CSV is very helpful when wanting to work with outside tools. I sometimes use Calibre's list to be able to manipulate the CSV with LibreOffice Calc.
That csv export from reports is something like i was searching, but for the missing words. But playing with the editor of calibre i see it has an option to copy to clipboard all the words from spellcheck and paste them in an excel. This also works for me.
elchamaco is offline   Reply With Quote
Old 07-09-2019, 04:42 AM   #11
Tex2002ans
Wizard
Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.Tex2002ans ought to be getting tired of karma fortunes by now.
 
Posts: 2,297
Karma: 12126329
Join Date: Jul 2012
Device: Kobo Forma, Nook
Quote:
Originally Posted by KevinH View Post
BTW, Calc like excel will parse most text files if delimited in some way (need not be commas and quotes) or if field aligned.
Yep, tab-delimited is usually my favorite. Commas are just too common, and make manually reading the file in a text editor a chore.

Whenever exporting CSVs into LibreOffice Calc, a nice window pops up giving you lots of import options.

Quote:
Originally Posted by elchamaco View Post
That csv export from reports is something like i was searching, but for the missing words. But playing with the editor of calibre i see it has an option to copy to clipboard all the words from spellcheck and paste them in an excel. This also works for me.
You can also use the Spellcheck Lists in non-standard ways. Like in this thread, I explained how to use it to find a list of "foreign-language" words:

https://www.mobileread.com/forums/sh...59#post3812859

and go marking them up with xml:lang.

I've also done something similar when trying to normalize a collection of various articles between American/British spellings. You could:
  1. Mark ebook as English (US).
  2. Export CSV of "misspelled words".
  3. Mark ebook as English (UK).
  4. Export CSV of "misspelled words".

Compare both CSVs together, look at differences, and you can see:
  • Words that appear in one list are almost all the differently spelled words.
    • color <-> colour
  • Words that appear in both lists are almost all the actual misspelled/foreign words.
    • forign" + "sofritos"

Last edited by Tex2002ans; 07-09-2019 at 05:19 AM.
Tex2002ans is offline   Reply With Quote
Old 07-09-2019, 06:24 AM   #12
DiapDealer
Grand Sorcerer
DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.DiapDealer ought to be getting tired of karma fortunes by now.
 
DiapDealer's Avatar
 
Posts: 27,901
Karma: 198500000
Join Date: Jan 2010
Device: Nexus 7, Kindle Fire HD
Quote:
Originally Posted by BetterRed View Post
If Kovid had an Open with in the calibre editor could the PageEdit gadget be used from within it?
I suppose so, but that's a big IF.

And I retract my comment about your previous idea being a "bad, bad" one. Your logic for wanting it was sound. It's just not feasible/practical is all.
DiapDealer is offline   Reply With Quote
Old 07-09-2019, 11:23 AM   #13
elchamaco
Zealot
elchamaco will become famous soon enoughelchamaco will become famous soon enoughelchamaco will become famous soon enoughelchamaco will become famous soon enoughelchamaco will become famous soon enoughelchamaco will become famous soon enough
 
Posts: 128
Karma: 500
Join Date: Aug 2011
Device: kindle, boox
Quote:
Originally Posted by Tex2002ans View Post
Yep, tab-delimited is usually my favorite. Commas are just too common, and make manually reading the file in a text editor a chore.

Whenever exporting CSVs into LibreOffice Calc, a nice window pops up giving you lots of import options.



You can also use the Spellcheck Lists in non-standard ways. Like in this thread, I explained how to use it to find a list of "foreign-language" words:

https://www.mobileread.com/forums/sh...59#post3812859

and go marking them up with xml:lang.

I've also done something similar when trying to normalize a collection of various articles between American/British spellings. You could:
  1. Mark ebook as English (US).
  2. Export CSV of "misspelled words".
  3. Mark ebook as English (UK).
  4. Export CSV of "misspelled words".

Compare both CSVs together, look at differences, and you can see:
  • Words that appear in one list are almost all the differently spelled words.
    • color <-> colour
  • Words that appear in both lists are almost all the actual misspelled/foreign words.
    • forign" + "sofritos"
Yes you can do a lot of stuff, i want to use it to upgrade dictionaries with misssing words. But not only hunspell... stardict/mobi dictionaries. I'll create a hunspell dictionary from stardict, and find missing words in different books to improve the main dictionary, main definitions and inflected forms.

Probably the best choice will be to create a script that checks all the words from a epub book against a hunspell dictionary and export the missing words, but a to begin the manual method can work.
elchamaco is offline   Reply With Quote
Old 07-09-2019, 11:47 AM   #14
KevinH
Sigil Developer
KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.KevinH ought to be getting tired of karma fortunes by now.
 
Posts: 8,094
Karma: 5450184
Join Date: Nov 2009
Device: many
Please note for Hunspell dictionaries that properly use affix detection and compression, you should not add unflagged words to the dictionary. The proper way to handle that for en is to expand the dictionary (by reversing affix flag usage) to recreate a plain word list, add you new words and be sure to add all versions of the word with prefixes and suffixes, and then re-crunch the wordlist.

This process seems to have been lost over the years as people do not understand the affix rules and affix compression.

For example the en US dict that Sigil used to use had no affix compression used at all. Being the original author of MySpell (predecessor of hunspell) and one-time head of OpenOffice's lingucomponent project, it is sad to see information on how to properly create dictionaries that are not giant wordlists has been lost.

In addition, the role of a spellcheck dictionary is not the same as an online dictionary or real dictionary. Spellcheck dictionaries should be designed to focus on the "working set" of a language and NOT try to be all encompassing as this actually leads to fewer incorrect words being detected as common mistakes turn out to be real but not typically used words, or slang, or abbreviations, or whatnot.

You are better off creating additional user dictionaries that catch common words you use that are not covered by the spellcheck dictionaries, to expand your personal "working set" of the language.
KevinH is offline   Reply With Quote
Old 07-09-2019, 03:01 PM   #15
elibrarian
Imperfect Perfectionist
elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.elibrarian ought to be getting tired of karma fortunes by now.
 
elibrarian's Avatar
 
Posts: 524
Karma: 863576
Join Date: Dec 2011
Location: Ølstykke, Denmark
Device: none
Quote:
Originally Posted by elchamaco View Post
Probably the best choice will be to create a script that checks all the words from a epub book against a hunspell dictionary and export the missing words, but a to begin the manual method can work.
You might find the "linguist"-exrension for Libreoffice Writer useful. One of the things it does is making a list of not-recognized words in the active document. It's rather old, but since it's python and not LibreOfficeBasic, it still works, and it's quite fast too:

https://extensions.libreoffice.org/e...linguist/1.5.1

Regards,

Kim
elibrarian is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[Bug] Spellcheck List Cursor Location Tex2002ans Sigil 8 08-06-2018 10:53 AM
Export words from Pocketbook superpawko PocketBook 4 12-27-2017 04:06 PM
Spellcheck Ignore Words tetrault Sigil 4 02-11-2017 03:25 PM
Spellcheck in book view + selected text spellcheck unfairrobot Sigil 2 12-19-2016 04:50 PM
Unable to use spellcheck dictionary for italicizing words sjhawar Sigil 18 10-20-2016 03:01 PM


All times are GMT -4. The time now is 10:33 PM.


MobileRead.com is a privately owned, operated and funded community.