Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Plugins

Notices

Reply
 
Thread Tools Search this Thread
Old 08-23-2018, 12:47 PM   #211
Shark69
Zealot
Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.
 
Shark69's Avatar
 
Posts: 136
Karma: 493152
Join Date: Mar 2012
Location: Spain
Device: Kindle Oasis 2
I am sorry. I realized that the files did not work. In addition, we can not attach files with author rights. Because of this, I deleted them. There were two main problems. The first one was that the plugin is not very good recovering aliases characters from the X-Ray file. Manual tasks should be done to debug the list. The second and more important is the quality of the ebook. It's quite poor and the plugin does not work fine with files with extrange data in the html labels. The problem can be solved with some hours of work to regenerate a more clean ebook, but I don't know if it is worth.
Shark69 is offline   Reply With Quote
Old 08-23-2018, 12:52 PM   #212
KloudZ
Junior Member
KloudZ began at the beginning.
 
Posts: 5
Karma: 10
Join Date: Feb 2017
Device: Kindle Voyage
Quote:
Originally Posted by Shark69 View Post
I am sorry. I realized that the files did not work. In addition, we can not attach files with author rights. Because of this, I deleted them. There were two main problems. The first one was that the plugin is not very good recovering aliases characters from the X-Ray file. Manual tasks should be done to debug the list. The second and more important is the quality of the ebook. It's quite poor and the plugin does not work fine with files with extrange data in the html labels. The problem can be solved with some hours of work to regenerate a more clean ebook, but I don't know if it is worth.
Sounds complicatd. Thanks anyway!
KloudZ is offline   Reply With Quote
Advert
Old 08-31-2018, 04:18 PM   #213
Bulu009
Junior Member
Bulu009 began at the beginning.
 
Posts: 3
Karma: 10
Join Date: Aug 2018
Device: Kindle Paperwhite 3
I have the following error. I am on KDE Neon 5.13.4 and Calibre 3.30. Please help me.



Starting job: Creating Files

09-01-2018 01:38:18 Initializing...
09-01-2018 01:38:18 Echo Burning - Lee Child
09-01-2018 01:38:18 Parsing Goodreads data...
Job: "Creating Files" failed with error:
Traceback (most recent call last):
File "site-packages/calibre/gui2/threaded_jobs.py", line 84, in start_work
File "calibre_plugins.xray_creator.lib.xray_creator ", line 284, in create_files_event
File "calibre_plugins.xray_creator.lib.book", line 223, in create_files_event
File "calibre_plugins.xray_creator.lib.book", line 443, in _parse_goodreads_data
File "calibre_plugins.xray_creator.lib.goodreads_parser ", line 40, in parse
File "calibre_plugins.xray_creator.lib.goodreads_parser ", line 50, in _get_xray
File "calibre_plugins.xray_creator.lib.goodreads_parser ", line 254, in _get_quotes
UnicodeEncodeError: 'latin-1' codec can't encode character u'\u201c' in position 0: ordinal not in range(256)

Called with args: (,) {u'abort': , u'notifications': , u'log': }
Bulu009 is offline   Reply With Quote
Old 08-31-2018, 11:30 PM   #214
szarroug3
Zealot
szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'
 
Posts: 104
Karma: 10000
Join Date: Apr 2016
Device: Kindle PW2
Hey everyone, so sorry i've been AWOL.. I just don't have the time to work on this plugin anymorer .. I did just make a super minor fixed that may have fixed some issues for you guys where some test was put into <i> tags instead of <p> tags. I'm not sure it'll actually help many of you but it might.
szarroug3 is offline   Reply With Quote
Old 09-01-2018, 04:44 AM   #215
Shark69
Zealot
Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.
 
Shark69's Avatar
 
Posts: 136
Karma: 493152
Join Date: Mar 2012
Location: Spain
Device: Kindle Oasis 2
Thanks... New version has test files inside... FYI
Shark69 is offline   Reply With Quote
Advert
Old 09-01-2018, 12:26 PM   #216
Shark69
Zealot
Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.
 
Shark69's Avatar
 
Posts: 136
Karma: 493152
Join Date: Mar 2012
Location: Spain
Device: Kindle Oasis 2
¿Is this correct?
PARAGRAPH_PAT = re.compile(r'<(p|i|h\d) .*?>.+?(?:<\/\1)', re.I)
Due the blank, paragraphs like
<p>Hello!</p>
are not catch... I think....
Shark69 is offline   Reply With Quote
Old 09-02-2018, 12:06 AM   #217
szarroug3
Zealot
szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'
 
Posts: 104
Karma: 10000
Join Date: Apr 2016
Device: Kindle PW2
Woah you are fast haha. I got rid of the test files and completely refactored the book parsing algorithm.. Now it uses all regex instead of a mix of regex and some other algorithms. It's much more accurate as far as I can tell and this way, I don't have to encode/decode the html which should make it work better for books with non-ascii characters.
szarroug3 is offline   Reply With Quote
Old 09-02-2018, 12:07 AM   #218
szarroug3
Zealot
szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'
 
Posts: 104
Karma: 10000
Join Date: Apr 2016
Device: Kindle PW2
dammit, i still have an old pyc file in there!
szarroug3 is offline   Reply With Quote
Old 09-02-2018, 12:20 AM   #219
szarroug3
Zealot
szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'
 
Posts: 104
Karma: 10000
Join Date: Apr 2016
Device: Kindle PW2
okay i removed the test files for realsies this time.. or did i?!
szarroug3 is offline   Reply With Quote
Old 09-02-2018, 12:32 AM   #220
szarroug3
Zealot
szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'
 
Posts: 104
Karma: 10000
Join Date: Apr 2016
Device: Kindle PW2
So I just noticed it won't catch instances where there's non-whitespace, non-alphabetic character right before the word. ie "Armansky won't be caught because the kindle would highlight the " as well as the word. Guess that's more regex work for me haha

Edit: now that I think about it, I think quotes are the only valid case for this. I'm not going to attempt to catch typos like forgetting a space after a period or comma. Nothing else coming before the word would make sense other than a quote of some type so guess i'll just make it check for those as well.

Edit2: I guess catching everything until the previous whitespace is another easy option. That way if it is a typo, people can still use it.. Decisions decisions.

Edit3: I decided to go with the anything that's connected to the word before up until a whitespace. If someone gives me good reason to change this, I will.

Last edited by szarroug3; 09-02-2018 at 01:04 AM.
szarroug3 is offline   Reply With Quote
Old 09-02-2018, 04:51 AM   #221
Shark69
Zealot
Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.
 
Shark69's Avatar
 
Posts: 136
Karma: 493152
Join Date: Mar 2012
Location: Spain
Device: Kindle Oasis 2
Thanks for the new version. I have to study more accurately because a lot of things are changed in parsing, but I've found a problem not existing in the prior version. The count field in entity table from the sqlite asc file is no longer updated. Another thing... names begining and ending with a non ascii char as "René" or "Ángel" are not located.
Shark69 is offline   Reply With Quote
Old 09-05-2018, 02:24 AM   #222
szarroug3
Zealot
szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'
 
Posts: 104
Karma: 10000
Join Date: Apr 2016
Device: Kindle PW2
Quote:
Originally Posted by Shark69 View Post
Thanks for the new version. I have to study more accurately because a lot of things are changed in parsing, but I've found a problem not existing in the prior version. The count field in entity table from the sqlite asc file is no longer updated. Another thing... names begining and ending with a non ascii char as "René" or "Ángel" are not located.
Fixed the count thing along with a few other unrelated minor things. Not sure what's wrong with the the non-ascii character starting words. I'll try to look tomorrow.

I did run a quick test using regexr. Looks like it works to me. Are you sure that the name is written correctly in the config and that it uses that same character in the book itself?

Last edited by szarroug3; 09-05-2018 at 02:37 AM.
szarroug3 is offline   Reply With Quote
Old 09-05-2018, 01:19 PM   #223
Shark69
Zealot
Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.
 
Shark69's Avatar
 
Posts: 136
Karma: 493152
Join Date: Mar 2012
Location: Spain
Device: Kindle Oasis 2
Sure.... I can provide you with an example... json file, test ebook file and asc file generated. Look at the pictures, please...
And thanks... of course

From json:

Quote:
"René": {"description": "Jefe de las tropas francesas. ", "aliases": ["René"]},





]

Last edited by Shark69; 09-05-2018 at 01:27 PM.
Shark69 is offline   Reply With Quote
Old 09-06-2018, 10:19 PM   #224
szarroug3
Zealot
szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'szarroug3 knows the difference between 'who' and 'whom'
 
Posts: 104
Karma: 10000
Join Date: Apr 2016
Device: Kindle PW2
Okay, so I've figured out what's wrong but I can't figure out how to fix it. In the regex pattern I wrote, i use \b around the word I'm looking for. Turns out that this doesn't work when the first or last character in the word is non-ascii.

There are three different positions that qualify as word boundaries:
  • Before the first character in the string, if the first character is a word character.
  • After the last character in the string, if the last character is a word character.
  • Between two characters in the string, where one is a word character and the other is not a word character.

Basically, since the non-ascii character doesn't count as a "word" character, it doesn't fulfill any of these requirements.

I'm still working on it.

Last edited by szarroug3; 09-06-2018 at 10:31 PM.
szarroug3 is offline   Reply With Quote
Old 09-07-2018, 02:00 PM   #225
Shark69
Zealot
Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.Shark69 ought to be getting tired of karma fortunes by now.
 
Shark69's Avatar
 
Posts: 136
Karma: 493152
Join Date: Mar 2012
Location: Spain
Device: Kindle Oasis 2
As an alternative and talking about the code before the refactoring (because I know it better), I'd like to suggest you processing the text with four regex:

For aliases inside the paragraph:
word_pat = re.compile(r'(?=([^a-zA-Z0-9_]' + r'[^a-zA-Z0-9_]|[^a-zA-Z0-9_]'.join(escaped_word_list) + r'[^a-zA-Z0-9_]))', re.I)

For aliases at the beginning of paragraph:
word_pat = re.compile(r'(?=(^' + r'[^a-zA-Z0-9_]|^'.join(escaped_word_list) + r'[^a-zA-Z0-9_]))', re.I)

For aliases at the end of paragraph:
word_pat = re.compile(r'(?=([^a-zA-Z0-9_]' + r'$|[^a-zA-Z0-9_]'.join(escaped_word_list) + r'$))', re.I)

and then for aliases found just as a paragraph:
word_pat = re.compile(r'(?=(^' + r'$|^'.join(escaped_word_list) + r'$))', re.I)

I've checked it with success.

Last edited by Shark69; 09-07-2018 at 02:13 PM.
Shark69 is offline   Reply With Quote
Reply

Tags
x-ray


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
[GUI Plugin] EpubMerge JimmXinu Plugins 522 04-01-2024 10:25 AM
[GUI Plugin] KindleUnpack - The Plugin DiapDealer Plugins 492 10-25-2022 08:13 AM
[GUI Plugin] Unplugged Jellby Plugins 16 09-03-2019 02:57 PM
[GUI Plugin] Astro-ph iatheia Plugins 14 07-25-2015 11:41 PM
[GUI Plugin] Plugin Updater **Deprecated** kiwidude Plugins 159 06-19-2011 12:27 PM


All times are GMT -4. The time now is 04:23 AM.


MobileRead.com is a privately owned, operated and funded community.