Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Plugins

Notices

Reply
 
Thread Tools Search this Thread
Old 06-25-2022, 12:25 PM   #331
j.p.s
Grand Sorcerer
j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.
 
Posts: 5,278
Karma: 98804578
Join Date: Apr 2011
Device: pb360
I have not tried the latest plugin yet, but it sounds like it is exactly what I need. Thank you!

Can you post an example of running the plugin from a terminal?
j.p.s is offline   Reply With Quote
Old 06-25-2022, 09:45 PM   #332
xxyzz
Evangelist
xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.
 
Posts: 411
Karma: 2666666
Join Date: Nov 2020
Device: none
I just pushed the cli support commit to GitHub. You can run this command to create files.

Code:
calibre-debug -r WordDumb -- book_path
Use `-h` to check other options:
Code:
calibre-debug -r WordDumb -- -h
xxyzz is offline   Reply With Quote
Old 06-25-2022, 09:49 PM   #333
j.p.s
Grand Sorcerer
j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.
 
Posts: 5,278
Karma: 98804578
Join Date: Apr 2011
Device: pb360
Quote:
Originally Posted by xxyzz View Post
Have you tried the latest plugin from GitHub Action?
I tried it and it worked.

Thanks!
j.p.s is offline   Reply With Quote
Old 06-25-2022, 09:52 PM   #334
j.p.s
Grand Sorcerer
j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.
 
Posts: 5,278
Karma: 98804578
Join Date: Apr 2011
Device: pb360
Quote:
Originally Posted by xxyzz View Post
I just pushed the cli support commit to GitHub. You can run this command to create files.

Code:
calibre-debug -r WordDumb -- book_path
Use `-h` to check other options:
Code:
calibre-debug -r WordDumb -- -h
Thanks! I will give it a try.

Can the book path be someplace not in calibre's library? (Not added to calibre)
j.p.s is offline   Reply With Quote
Old 06-25-2022, 09:56 PM   #335
xxyzz
Evangelist
xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.
 
Posts: 411
Karma: 2666666
Join Date: Nov 2020
Device: none
Quote:
Originally Posted by j.p.s View Post
Thanks! I will give it a try.

Can the book path be someplace not in calibre's library? (Not added to calibre)
Yes, the book file can be anywhere. But the language and format of the book must be supported.

Last edited by xxyzz; 06-25-2022 at 09:58 PM.
xxyzz is offline   Reply With Quote
Old 06-26-2022, 05:57 PM   #336
j.p.s
Grand Sorcerer
j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.
 
Posts: 5,278
Karma: 98804578
Join Date: Apr 2011
Device: pb360
Quote:
Originally Posted by xxyzz View Post
I just pushed the cli support commit to GitHub.
Quote:
Originally Posted by xxyzz View Post
Yes, the book file can be anywhere. But the language and format of the book must be supported.
It worked great. Using the CLI was much easier for me.

Unrelated to CLI, but I can also report that for a couple of books about Roman history, it did a better job for several of the most important (major historical) characters than the X-ray files supplied by amazon for the books.

Some people are designated terms and some terms are designated as people.

Also, one of the changes you made in past week or so also fixed a crash that I was going to report after the new functionality was in place.
j.p.s is offline   Reply With Quote
Old 06-26-2022, 07:43 PM   #337
xxyzz
Evangelist
xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.
 
Posts: 411
Karma: 2666666
Join Date: Nov 2020
Device: none
Quote:
Originally Posted by j.p.s View Post
it did a better job for several of the most important (major historical) characters than the X-ray files supplied by amazon for the books.
I though Amazon's files are created manually...

Quote:
Originally Posted by j.p.s View Post
Some people are designated terms and some terms are designated as people.
If you have a CUDA-compatible GPU, you could try spaCy's transformer model(https://spacy.io/usage#gpu), it has higher NER accuracy then the CPU model. You also have to install the dependencies manually and change the code a bit(a few lines, maybe).

Ultimately, you could train your own model for specificity kinds of books.
xxyzz is offline   Reply With Quote
Old 06-26-2022, 08:43 PM   #338
j.p.s
Grand Sorcerer
j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.
 
Posts: 5,278
Karma: 98804578
Join Date: Apr 2011
Device: pb360
Quote:
Originally Posted by xxyzz View Post
I though Amazon's files are created manually...
They supposedly are made at least in part manually, but see my thread "How many X-ray mistaken identities can you find in your books?"
https://www.mobileread.com/forums/sh...d.php?t=309190
and "Easily fix egregious X-ray errors"
https://www.mobileread.com/forums/sh...d.php?t=303347
especially post #6, which is one of the books I tested WordDumb on.
Quote:
If you have a CUDA-compatible GPU, you could try spaCy's transformer model(https://spacy.io/usage#gpu), it has higher NER accuracy then the CPU model. You also have to install the dependencies manually and change the code a bit(a few lines, maybe).

Ultimately, you could train your own model for specificity kinds of books.
I use the open source nvidia driver, so I can't use CUDA.

I'm almost out of time for this for a while, but I think I am almost ready to use it for a few books that don't have X-ray instead of testing it against books that do have X-ray.
j.p.s is offline   Reply With Quote
Old 06-27-2022, 01:15 AM   #339
xxyzz
Evangelist
xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.
 
Posts: 411
Karma: 2666666
Join Date: Nov 2020
Device: none
Quote:
Originally Posted by j.p.s View Post
They supposedly are made at least in part manually, but see my thread "How many X-ray mistaken identities can you find in your books?"
https://www.mobileread.com/forums/sh...d.php?t=309190
and "Easily fix egregious X-ray errors"
https://www.mobileread.com/forums/sh...d.php?t=303347
especially post #6, which is one of the books I tested WordDumb on.
But the SQL files you shared only work for certain book files, if the publisher changes just one character the following X-Ray offsets are all changed. Maybe you could shared the corrections as customized X-Ray JSON file, so all formats of the same book can use this file and others can further improve it.
xxyzz is offline   Reply With Quote
Old 06-27-2022, 03:12 PM   #340
j.p.s
Grand Sorcerer
j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.
 
Posts: 5,278
Karma: 98804578
Join Date: Apr 2011
Device: pb360
Quote:
Originally Posted by xxyzz View Post
But the SQL files you shared only work for certain book files, if the publisher changes just one character the following X-Ray offsets are all changed. Maybe you could shared the corrections as customized X-Ray JSON file, so all formats of the same book can use this file and others can further improve it.
I stopped working on that because it was too much work and no one showed any interest.

I am only referring to those threads as an example of how many bad errors are in almost every X-ray file supplied by amazon. I think the purpose of most X-ray generating utilities is to let people make their own X-ray files for books that do not have them, but I think WordDumb has the potential to make better X-ray files than amazon.
j.p.s is offline   Reply With Quote
Old 06-28-2022, 01:17 AM   #341
xxyzz
Evangelist
xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.
 
Posts: 411
Karma: 2666666
Join Date: Nov 2020
Device: none
I added the source column to the customize X-Ray table. `None` means the description is a quote from the book, `1` for Wikipedia and `2` for Fandom.
xxyzz is offline   Reply With Quote
Old 06-28-2022, 09:57 AM   #342
dizaam
Junior Member
dizaam began at the beginning.
 
Posts: 1
Karma: 10
Join Date: Jun 2022
Device: Kindle
stuck installing

hello im stuck when running worddump on installing en_core_web_lg
dizaam is offline   Reply With Quote
Old 06-28-2022, 10:11 AM   #343
xxyzz
Evangelist
xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.
 
Posts: 411
Karma: 2666666
Join Date: Nov 2020
Device: none
Quote:
Originally Posted by dizaam View Post
hello im stuck when running worddump on installing en_core_web_lg
That size of that model is 382 MB(https://spacy.io/models/en#en_core_web_lg). You could select the medium model(31 MB) if you don't want to wait.
xxyzz is offline   Reply With Quote
Old 07-03-2022, 11:59 PM   #344
xxyzz
Evangelist
xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.xxyzz ought to be getting tired of karma fortunes by now.
 
Posts: 411
Karma: 2666666
Join Date: Nov 2020
Device: none
v3.22.0

New features
  • Use customize X-Ray entities to fix NER errors
  • Add command line interface
  • Include POS type to customize Word Wise table
  • Add GPE entity founding date to X-Ray description
  • Add X-Ray entity minimal occurrences option

Bug fixes
  • Fix upper case image file not found error
  • Only copy Word Wise dictionary file from device
  • Fix PyQt6 Enum errors
xxyzz is offline   Reply With Quote
Old 07-05-2022, 07:12 PM   #345
j.p.s
Grand Sorcerer
j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.j.p.s ought to be getting tired of karma fortunes by now.
 
Posts: 5,278
Karma: 98804578
Join Date: Apr 2011
Device: pb360
Bug: newline in X-ray entity label

When a (first?) occurrance of an X-ray entity with 2 or more words has a newline between 2 of the words in the text of the book the entity label has a newline between those words in the generated X-ray file. This can be seen in the attached example.png

WordDumb might also make two entiies, one with a newline embedded in the label and one without. It is also possible to workaround this by making an entry in the worddumb-custom-x-ray.json file for the book with an alias that has "\n" in place of the space as can be seen in workaround.txt

Also attached is NewlineBug.tar.gz which has an azw3 of the public domain book, a worddumb-custom-x-ray.json for both cases, a temporary worddumb.json plugin configuration file, the 2 generated XRAY.*.asc files, and a script to generate the XRAY files and tsv dumps of the entity and entity_description tables for both cases.
Attached Thumbnails
Click image for larger version

Name:	example.png
Views:	79
Size:	72.6 KB
ID:	194757   Click image for larger version

Name:	workaround.png
Views:	73
Size:	69.8 KB
ID:	194758  
Attached Files
File Type: gz NewlineBug.tar.gz (7.36 MB, 53 views)
j.p.s is offline   Reply With Quote
Reply

Tags
worddumb, x-ray


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[GUI Plugin] CalibreSpy DaltonST Plugins 244 01-27-2024 06:43 AM
[GUI Plugin] KindleUnpack - The Plugin DiapDealer Plugins 492 10-25-2022 08:13 AM
[GUI Plugin] Manga plugin mastertea Plugins 6 01-06-2022 02:43 AM
[GUI Plugin] Save Virtual Libraries To Column (GUI) chaley Plugins 14 04-04-2021 05:25 AM
[GUI Plugin] Plugin Updater **Deprecated** kiwidude Plugins 159 06-19-2011 12:27 PM


All times are GMT -4. The time now is 05:02 PM.


MobileRead.com is a privately owned, operated and funded community.