View Single Post
Old 01-28-2021, 10:06 PM   #5298
ownedbycats
Custom User Title
ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.ownedbycats ought to be getting tired of karma fortunes by now.
 
ownedbycats's Avatar
 
Posts: 11,050
Karma: 75568269
Join Date: Oct 2018
Location: Canada
Device: Kobo Libra H2O, formerly Aura HD
I did a bit of investigating into the eml issue. It's due to the presence of two Unicode Latin-1 supplement characters (in this case, the chapter title is "Déjà-vu") in the message body. Honestly something that didn't occur to me at first until I went mucking around in the geturls.py code.

I was able to confirm this by saving a copy of the .eml and modifying it to remove the characters. Also emailing myself two messages: one containing just the url, and the other one containing the URL and chapter title. Former worked as expected, latter pasted the filepath.

ffnet uses Content-Type: text/plain; charset="utf-8" for its messages.

Haven't tested whether the issue is limited to Latin-1 supplement. I did notice this in geturls.py:

Code:
def get_urls_from_text(data,configuration=None,normalize=False,email=False):
    urls = collections.OrderedDict()
    try:
        # py3 can have issues with extended chars in txt emails
        data = ensure_str(data,errors='replace')
    except UnicodeDecodeError:
        data = data.decode('utf8') ## for when called outside calibre.
I hope this helps a bit.

Last edited by ownedbycats; 01-28-2021 at 10:22 PM.
ownedbycats is offline   Reply With Quote