Quote:
Originally Posted by Shark69
Hi szarroug3
Thanks for the plugin, great work!
I'm using it building the input json manually and it generally works very well.
But I've detected two stange behaiviour and I'd like to know if there is any possibility to fix them.
The problems are:
When the character name begins or ends with a accent (non English names) like Ángel or Bernabé, the parser does not detect this character name.
The other one: I don't know why, but in some books, the X-Ray popup window with the character description does not appears when you highlight the word. If I manually click x-ray, I can find the character descrption.
I suppose it can be due to a format problem with the book or a bug in the parser. Anyone can give me an advice?
Thanks.
|
Any idea to fix problem when the character name begins, ends or contain a non English character. I think the problem is related with expreg in book_parser.py line:
word_pat = re.compile(r'(\b' + r'\b|\b'.join(escaped_word_list) + r'\b)', re.I)
and then:
entity_id = self._process_match(match, codec, excerpt_id, word_loc)
The \b command not seem to do right with this kind of chars...
Too, I think is something related with the UTF-8 length of this characters for word containing this kind of chars.
Can anyone help me?