Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Plugins

Notices

Reply
 
Thread Tools Search this Thread
Old 06-18-2022, 08:26 PM   #316
xxyzz
Evangelist
xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.
 
Posts: 442
Karma: 2666666
Join Date: Nov 2020
Device: none
Quote:
Originally Posted by j.p.s View Post
Does WordDumb have an option for the user to supply descriptions for x-ray entities?
It doesn't have this feature but I can implement it by adding a new customize X-Ray dialog to the settings and save the data to a JSON file.
xxyzz is offline   Reply With Quote
Old 06-18-2022, 09:16 PM   #317
j.p.s
Grand Sorcerer
j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.
 
Posts: 5,802
Karma: 103362673
Join Date: Apr 2011
Device: pb360
Quote:
Originally Posted by xxyzz View Post
It doesn't have this feature but I can implement it by adding a new customize X-Ray dialog to the settings and save the data to a JSON file.
Thanks! That would be great.

Would it also be possible for the user to supply the JSON file and the dialog have an option to read it?

Also it would also be good for the JSON to have optional fields specifying whether entity is person or term and maybe a field for source.
j.p.s is offline   Reply With Quote
Old 06-18-2022, 10:02 PM   #318
xxyzz
Evangelist
xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.
 
Posts: 442
Karma: 2666666
Join Date: Nov 2020
Device: none
Quote:
Originally Posted by j.p.s View Post
Would it also be possible for the user to supply the JSON file and the dialog have an option to read it?
I'm planning to add a dialog like the current customize word wise dialog with editable text input box and add and delete button. You could edit the JSON file in any text editor.

Quote:
Originally Posted by j.p.s View Post
Also it would also be good for the JSON to have optional fields specifying whether entity is person or term and maybe a field for source.
Indeed, that could correct some X-Ray type. For example, a person is marked as a term by spaCy.

I'm also planning to use Wiktionary's data from https://kaikki.org to add Word Wise <ruby> tag and footnote to EPUB books, both features would take some time to implement.
xxyzz is offline   Reply With Quote
Old 06-21-2022, 12:10 AM   #319
xxyzz
Evangelist
xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.
 
Posts: 442
Karma: 2666666
Join Date: Nov 2020
Device: none
Hi j.p.s, I have pushed the code to GitHub, please install the test version from GitHub Actions and see if everything works.

I didn't add the source in the new dialog because it's more complicated and I didn't figure out how to make the table and window auto resize.
xxyzz is offline   Reply With Quote
Old 06-21-2022, 07:02 PM   #320
j.p.s
Grand Sorcerer
j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.
 
Posts: 5,802
Karma: 103362673
Join Date: Apr 2011
Device: pb360
Quote:
Originally Posted by xxyzz View Post
Hi j.p.s, I have pushed the code to GitHub, please install the test version from GitHub Actions and see if everything works.
Thank you, xxyzz!

I was able to download and install the artifact and bring up the dialog and enter and save information. But when I run "Create X-Ray" the contents of .config/calibre/plugins/worddumb-custom-x-ray.json do not seem to affect the created XRAY.entities.ASIN.asc file that gets created.
Quote:
I didn't add the source in the new dialog because it's more complicated and I didn't figure out how to make the table and window auto resize.
I can live without the source column, especially to avoid waiting to get things rolling.

I would also be happy to create the JSON file outside calibre/WordDumb. Of course, having the dialog is nice so that I don't have to guess the format, file name, and location.

Having all the entries in one file might not work so well for all books. Maybe eventually WordDumb could look for a worddumb-custom-x-ray.json file in the same directory as the book?
j.p.s is offline   Reply With Quote
Old 06-21-2022, 07:37 PM   #321
xxyzz
Evangelist
xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.
 
Posts: 442
Karma: 2666666
Join Date: Nov 2020
Device: none
Quote:
Originally Posted by j.p.s View Post
But when I run "Create X-Ray" the contents of .config/calibre/plugins/worddumb-custom-x-ray.json do not seem to affect the created XRAY.entities.ASIN.asc file that gets created.
Maybe the name of your customized X-Ray data doesn't match the name in the .asc file? I can use spaCy's Entity Ruler to make spaCy recognize those names.

Quote:
Originally Posted by j.p.s View Post
Maybe eventually WordDumb could look for a worddumb-custom-x-ray.json file in the same directory as the book?
I can add another dialog to select the book then show the edit X-Ray dialog and save the JSON file in the book folder. You could create and edit the JSON file without using the edit dialog. The `is person` column will be changed to NER label.
xxyzz is offline   Reply With Quote
Old 06-21-2022, 07:59 PM   #322
j.p.s
Grand Sorcerer
j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.
 
Posts: 5,802
Karma: 103362673
Join Date: Apr 2011
Device: pb360
Quote:
Originally Posted by xxyzz View Post
Maybe the name of your customized X-Ray data doesn't match the name in the .asc file? I can use spaCy's Entity Ruler to make spaCy recognize those names.
I had both entity names that are in the XRAY already and entities whose names are not. I was hoping that WordDumb would add those to a search list. I am using the default WordDumb setting for spaCy.

I think I understand better now, this new capbility is for correcting entities that WordDumb will detect, not help WordDumb detect an entity.

I know you are having trouble with too many columns in the dialog, but I think making the first JSON column the entity id (assuming 2 runs of WordDumb with the exact same configuration settings would detect the same entities and give them the same id number.

Can WordDumb be run from the command line?

Quote:
I can add another dialog to select the book then show the edit X-Ray dialog and save the JSON file in the book folder. You could create and edit the JSON file without using the edit dialog. The `is person` column will be changed to NER label.
Whatever you think is the best approach to get this capability.

Of course I am most interested to learn what I am doing wrong.

Should I use the 1984 or some other book for practice so that we can duplicate each others results?
j.p.s is offline   Reply With Quote
Old 06-21-2022, 08:28 PM   #323
xxyzz
Evangelist
xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.
 
Posts: 442
Karma: 2666666
Join Date: Nov 2020
Device: none
The current feature only use the customized description if spaCy can find the name from the book and both names must be the same. And if the customized X-Ray is person but spaCy thinks it's not a person, the code will set the entity as a person.

With spaCy's Entity Ruler, I can let spaCy to find these customized X-Ray entities if it can't find them before.

It's not the column numbers that I'm worrying about, it's just I'm having trouble with auto resize the table and the dialog window... Especially when these is a combobox in the table.

WordDumb can't run in the command line now. I'm not sure whether some features will work in the terminal, for example: device detection.

I'm using this book for testing on GitHub but this book has soft hyphens, you may want to remove them or convert to KFX to get better X-Ray quality.
xxyzz is offline   Reply With Quote
Old 06-21-2022, 08:42 PM   #324
j.p.s
Grand Sorcerer
j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.
 
Posts: 5,802
Karma: 103362673
Join Date: Apr 2011
Device: pb360
OK, it sounds like spaCy's Entity Ruler is the way to go.

The reason I wanted to have the type column was to correct spaCy's mistakes. It looks like my best option is to use WordDumb to make the XRAY file, then use SQL to fix name, type, and description.

I will start using the book from srandardebooks.com and remove the soft hyphens.

I think XRAY has provisions for aliases (nicknames and other variations. Do spaCy and WordDumb have something similar?
j.p.s is offline   Reply With Quote
Old 06-21-2022, 08:55 PM   #325
xxyzz
Evangelist
xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.
 
Posts: 442
Karma: 2666666
Join Date: Nov 2020
Device: none
You don't have to modify the sqlite file, Entity Ruler will tell spaCy the X-Ray entity's name and type(NER label). And the .asc file will use the customized description if the name is the same.

I'm using RapidFuzz and Wikipedia(normalized or redirects) to merge similar X-Ray entities. And try to use the full name(has white space or interpunct) for the person X-Ray entity.

Entity Ruler also has this feature, I can add another aliases column(enter multiple data by separating them with "," or just one alias) to the table and assign the same id to all aliases.

Last edited by xxyzz; 06-21-2022 at 09:15 PM.
xxyzz is offline   Reply With Quote
Old 06-22-2022, 04:30 AM   #326
xxyzz
Evangelist
xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.
 
Posts: 442
Karma: 2666666
Join Date: Nov 2020
Device: none
I have pushed the changes to GitHub. I added a new customize X-Ray menu, it opens the edit dialog for selected books.
xxyzz is offline   Reply With Quote
Old 06-24-2022, 01:54 PM   #327
j.p.s
Grand Sorcerer
j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.
 
Posts: 5,802
Karma: 103362673
Join Date: Apr 2011
Device: pb360
Quote:
Originally Posted by xxyzz View Post
I'm using this book for testing on GitHub but this book has soft hyphens, you may want to remove them or convert to KFX to get better X-Ray quality.
Are you using the above book as is for testing, or did you remove the soft hyphens?

I installed the "Hyphenate This!" plugin and downloaded the above book and clicked the "Remove soft hypens.." option, then clicked "OK", but after it was finished the book did not change.

Can you attach the exact book that you use for testing so that we can see the same things happen when testing?
j.p.s is offline   Reply With Quote
Old 06-24-2022, 07:18 PM   #328
xxyzz
Evangelist
xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.
 
Posts: 442
Karma: 2666666
Join Date: Nov 2020
Device: none
I didn't remove soft hyphen for the test files(https://github.com/xxyzz/WordDumb/fi...1564/books.zip) because I didn't know the book has them back then. But the KFX book in the zip file doesn't have soft hyphen, maybe kindlepreviewer removed them in the convert process.
xxyzz is offline   Reply With Quote
Old 06-24-2022, 10:08 PM   #329
j.p.s
Grand Sorcerer
j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.
 
Posts: 5,802
Karma: 103362673
Join Date: Apr 2011
Device: pb360
Quote:
Originally Posted by xxyzz View Post
The current feature only use the customized description if spaCy can find the name from the book and both names must be the same. And if the customized X-Ray is person but spaCy thinks it's not a person, the code will set the entity as a person.
My problem was bcause I did not know the names had to match. I was confused because when I asked for the name column so that it could be corrected. For example, WordDumb made a name "Livy ii-iii" and about 8 other similar variations whent the correct name is "Livy". When there is only one incorrect form of the name and no correct form, it is easy to fix with SQL, but combining several incorrect forms is significant work. I had hoped that WordDumb could fix that.
Quote:
With spaCy's Entity Ruler, I can let spaCy to find these customized X-Ray entities if it can't find them before.

It's not the column numbers that I'm worrying about, it's just I'm having trouble with auto resize the table and the dialog window... Especially when these is a combobox in the table.
Can you add the funtionality first and read a file that I would be happy to generate myself in a text editor, then work on getting the dialog to work?

Quote:
WordDumb can't run in the command line now. I'm not sure whether some features will work in the terminal, for example: device detection.
Would it be possible to make a stripped down version that that only generates X-ray and does not interact with an e-reader at all?

Quote:
I'm using this book for testing on GitHub but this book has soft hyphens, you may want to remove them or convert to KFX to get better X-Ray quality.
I won't be doing anything with KFX unless there is no other way, and maybe not then.
j.p.s is offline   Reply With Quote
Old 06-24-2022, 11:07 PM   #330
xxyzz
Evangelist
xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.
 
Posts: 442
Karma: 2666666
Join Date: Nov 2020
Device: none
Have you tried the latest plugin from GitHub Action? You can add person name and their aliases in the new customize x-ray menu. You can created the ""worddumb-custom-x-ray.json" in a text editor and put it in the book folder. You could add "Livy" and see if it helps. Or add all the name variants as "Livy"'s aliases.

Example file:
Code:
[
  [
    "name a",             # entity name
    "PERSON",            # NER label
    "name-a,name-A", # aliases
    "name a desc"       # description, leave empty to use Wikipedia summary
  ],
  [
    "name b",
    "PERSON",
    "",
    ""
  ]
]
It's possible to make the plugin run in terminal and only create files, the test file already does that.

Last edited by xxyzz; 06-24-2022 at 11:15 PM.
xxyzz is offline   Reply With Quote
Reply

Tags
worddumb, x-ray


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[GUI Plugin] KindleUnpack - The Plugin DiapDealer Plugins 523 07-15-2025 06:45 PM
[GUI Plugin] CalibreSpy DaltonST Plugins 245 08-18-2024 09:33 PM
[GUI Plugin] Manga plugin mastertea Plugins 6 01-06-2022 02:43 AM
[GUI Plugin] Save Virtual Libraries To Column (GUI) chaley Plugins 14 04-04-2021 05:25 AM
[GUI Plugin] Plugin Updater **Deprecated** kiwidude Plugins 159 06-19-2011 12:27 PM


All times are GMT -4. The time now is 04:12 PM.


MobileRead.com is a privately owned, operated and funded community.