08-23-2018, 12:47 PM | #211 |
Zealot
Posts: 136
Karma: 493152
Join Date: Mar 2012
Location: Spain
Device: Kindle Oasis 2
|
I am sorry. I realized that the files did not work. In addition, we can not attach files with author rights. Because of this, I deleted them. There were two main problems. The first one was that the plugin is not very good recovering aliases characters from the X-Ray file. Manual tasks should be done to debug the list. The second and more important is the quality of the ebook. It's quite poor and the plugin does not work fine with files with extrange data in the html labels. The problem can be solved with some hours of work to regenerate a more clean ebook, but I don't know if it is worth.
|
08-23-2018, 12:52 PM | #212 | |
Junior Member
Posts: 5
Karma: 10
Join Date: Feb 2017
Device: Kindle Voyage
|
Quote:
|
|
Advert | |
|
08-31-2018, 04:18 PM | #213 |
Junior Member
Posts: 3
Karma: 10
Join Date: Aug 2018
Device: Kindle Paperwhite 3
|
I have the following error. I am on KDE Neon 5.13.4 and Calibre 3.30. Please help me.
Starting job: Creating Files 09-01-2018 01:38:18 Initializing... 09-01-2018 01:38:18 Echo Burning - Lee Child 09-01-2018 01:38:18 Parsing Goodreads data... Job: "Creating Files" failed with error: Traceback (most recent call last): File "site-packages/calibre/gui2/threaded_jobs.py", line 84, in start_work File "calibre_plugins.xray_creator.lib.xray_creator ", line 284, in create_files_event File "calibre_plugins.xray_creator.lib.book", line 223, in create_files_event File "calibre_plugins.xray_creator.lib.book", line 443, in _parse_goodreads_data File "calibre_plugins.xray_creator.lib.goodreads_parser ", line 40, in parse File "calibre_plugins.xray_creator.lib.goodreads_parser ", line 50, in _get_xray File "calibre_plugins.xray_creator.lib.goodreads_parser ", line 254, in _get_quotes UnicodeEncodeError: 'latin-1' codec can't encode character u'\u201c' in position 0: ordinal not in range(256) Called with args: (,) {u'abort': , u'notifications': , u'log': } |
08-31-2018, 11:30 PM | #214 |
Zealot
Posts: 104
Karma: 10000
Join Date: Apr 2016
Device: Kindle PW2
|
Hey everyone, so sorry i've been AWOL.. I just don't have the time to work on this plugin anymorer .. I did just make a super minor fixed that may have fixed some issues for you guys where some test was put into <i> tags instead of <p> tags. I'm not sure it'll actually help many of you but it might.
|
09-01-2018, 04:44 AM | #215 |
Zealot
Posts: 136
Karma: 493152
Join Date: Mar 2012
Location: Spain
Device: Kindle Oasis 2
|
Thanks... New version has test files inside... FYI
|
Advert | |
|
09-01-2018, 12:26 PM | #216 |
Zealot
Posts: 136
Karma: 493152
Join Date: Mar 2012
Location: Spain
Device: Kindle Oasis 2
|
¿Is this correct?
PARAGRAPH_PAT = re.compile(r'<(p|i|h\d) .*?>.+?(?:<\/\1)', re.I) Due the blank, paragraphs like <p>Hello!</p> are not catch... I think.... |
09-02-2018, 12:06 AM | #217 |
Zealot
Posts: 104
Karma: 10000
Join Date: Apr 2016
Device: Kindle PW2
|
Woah you are fast haha. I got rid of the test files and completely refactored the book parsing algorithm.. Now it uses all regex instead of a mix of regex and some other algorithms. It's much more accurate as far as I can tell and this way, I don't have to encode/decode the html which should make it work better for books with non-ascii characters.
|
09-02-2018, 12:07 AM | #218 |
Zealot
Posts: 104
Karma: 10000
Join Date: Apr 2016
Device: Kindle PW2
|
dammit, i still have an old pyc file in there!
|
09-02-2018, 12:20 AM | #219 |
Zealot
Posts: 104
Karma: 10000
Join Date: Apr 2016
Device: Kindle PW2
|
okay i removed the test files for realsies this time.. or did i?!
|
09-02-2018, 12:32 AM | #220 |
Zealot
Posts: 104
Karma: 10000
Join Date: Apr 2016
Device: Kindle PW2
|
So I just noticed it won't catch instances where there's non-whitespace, non-alphabetic character right before the word. ie "Armansky won't be caught because the kindle would highlight the " as well as the word. Guess that's more regex work for me haha
Edit: now that I think about it, I think quotes are the only valid case for this. I'm not going to attempt to catch typos like forgetting a space after a period or comma. Nothing else coming before the word would make sense other than a quote of some type so guess i'll just make it check for those as well. Edit2: I guess catching everything until the previous whitespace is another easy option. That way if it is a typo, people can still use it.. Decisions decisions. Edit3: I decided to go with the anything that's connected to the word before up until a whitespace. If someone gives me good reason to change this, I will. Last edited by szarroug3; 09-02-2018 at 01:04 AM. |
09-02-2018, 04:51 AM | #221 |
Zealot
Posts: 136
Karma: 493152
Join Date: Mar 2012
Location: Spain
Device: Kindle Oasis 2
|
Thanks for the new version. I have to study more accurately because a lot of things are changed in parsing, but I've found a problem not existing in the prior version. The count field in entity table from the sqlite asc file is no longer updated. Another thing... names begining and ending with a non ascii char as "René" or "Ángel" are not located.
|
09-05-2018, 02:24 AM | #222 | |
Zealot
Posts: 104
Karma: 10000
Join Date: Apr 2016
Device: Kindle PW2
|
Quote:
I did run a quick test using regexr. Looks like it works to me. Are you sure that the name is written correctly in the config and that it uses that same character in the book itself? Last edited by szarroug3; 09-05-2018 at 02:37 AM. |
|
09-05-2018, 01:19 PM | #223 | |
Zealot
Posts: 136
Karma: 493152
Join Date: Mar 2012
Location: Spain
Device: Kindle Oasis 2
|
Sure.... I can provide you with an example... json file, test ebook file and asc file generated. Look at the pictures, please...
And thanks... of course From json: Quote:
] Last edited by Shark69; 09-05-2018 at 01:27 PM. |
|
09-06-2018, 10:19 PM | #224 |
Zealot
Posts: 104
Karma: 10000
Join Date: Apr 2016
Device: Kindle PW2
|
Okay, so I've figured out what's wrong but I can't figure out how to fix it. In the regex pattern I wrote, i use \b around the word I'm looking for. Turns out that this doesn't work when the first or last character in the word is non-ascii.
There are three different positions that qualify as word boundaries:
Basically, since the non-ascii character doesn't count as a "word" character, it doesn't fulfill any of these requirements. I'm still working on it. Last edited by szarroug3; 09-06-2018 at 10:31 PM. |
09-07-2018, 02:00 PM | #225 |
Zealot
Posts: 136
Karma: 493152
Join Date: Mar 2012
Location: Spain
Device: Kindle Oasis 2
|
As an alternative and talking about the code before the refactoring (because I know it better), I'd like to suggest you processing the text with four regex:
For aliases inside the paragraph: word_pat = re.compile(r'(?=([^a-zA-Z0-9_]' + r'[^a-zA-Z0-9_]|[^a-zA-Z0-9_]'.join(escaped_word_list) + r'[^a-zA-Z0-9_]))', re.I) For aliases at the beginning of paragraph: word_pat = re.compile(r'(?=(^' + r'[^a-zA-Z0-9_]|^'.join(escaped_word_list) + r'[^a-zA-Z0-9_]))', re.I) For aliases at the end of paragraph: word_pat = re.compile(r'(?=([^a-zA-Z0-9_]' + r'$|[^a-zA-Z0-9_]'.join(escaped_word_list) + r'$))', re.I) and then for aliases found just as a paragraph: word_pat = re.compile(r'(?=(^' + r'$|^'.join(escaped_word_list) + r'$))', re.I) I've checked it with success. Last edited by Shark69; 09-07-2018 at 02:13 PM. |
Tags |
x-ray |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
[GUI Plugin] EpubMerge | JimmXinu | Plugins | 522 | 04-01-2024 10:25 AM |
[GUI Plugin] KindleUnpack - The Plugin | DiapDealer | Plugins | 492 | 10-25-2022 08:13 AM |
[GUI Plugin] Unplugged | Jellby | Plugins | 16 | 09-03-2019 02:57 PM |
[GUI Plugin] Astro-ph | iatheia | Plugins | 14 | 07-25-2015 11:41 PM |
[GUI Plugin] Plugin Updater **Deprecated** | kiwidude | Plugins | 159 | 06-19-2011 12:27 PM |