Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Recipes

Notices

Reply
 
Thread Tools Search this Thread
Old 06-02-2019, 07:13 AM   #1
siebert
Developer
siebert has a complete set of Star Wars action figures.siebert has a complete set of Star Wars action figures.siebert has a complete set of Star Wars action figures.
 
Posts: 155
Karma: 280
Join Date: Nov 2010
Device: Kindle 3 (Keyboard) 3G / iPad 9 WiFi / Google Pixel 6a (Android)
Receipe with raw string literal fails in calibre 3.44

Hi,

after the update from 3.39.1 to 3.44 I noticed that one of my recipes failed to load.

The reason is that the recipe contains a windows path stored as a raw string literal, e.g. r"C:\Users\foo\Dropbox"

I can work around the bug by using "C:\\Users\\foo\\Dropbox" instead, but as this is a regression I think it should be fixed in Calibre instead.

Here is a small bogus recipe which shows this issue when loaded in Calibre 3.44 (32bit Windows version):

Code:
from calibre.web.feeds.news import AutomaticNewsRecipe


OUPUT_PATH = r"C:\Users\foo\Dropbox"

class BasicUserRecipe(AutomaticNewsRecipe):
    title = u'Test based on Planet Python'
    language = 'en'
    __author__ = 'Jelle van der Waa'
    oldest_article = 10
    max_articles_per_feed = 100
    feeds = [(u'Planet Python', u'http://planetpython.org/rss20.xml')]
The error reported by Calibre is:

Code:
calibre, version 3.44.0
ERROR: Invalid recipe: Failed to compile the recipe, with syntax error: (unicode error) 'rawunicodeescape' codec can't decode bytes in position 2-3: truncated \uXXXX (<string>, line 4)
siebert is offline   Reply With Quote
Old 06-02-2019, 07:26 AM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,347
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
It's not a bug. recipe loading has been changed to use unicode literals to prepare for python 3. And in python 2 you cannot have a \U in a unicode string that is not of the form \U000xxxxxx
kovidgoyal is offline   Reply With Quote
Advert
Old 06-02-2019, 12:30 PM   #3
siebert
Developer
siebert has a complete set of Star Wars action figures.siebert has a complete set of Star Wars action figures.siebert has a complete set of Star Wars action figures.
 
Posts: 155
Karma: 280
Join Date: Nov 2010
Device: Kindle 3 (Keyboard) 3G / iPad 9 WiFi / Google Pixel 6a (Android)
I don't quite understand why it is necessary to change this in python 2, as for python 3 r"C:\Users\foo\Dropbox" works just fine.
siebert is offline   Reply With Quote
Old 06-02-2019, 10:31 PM   #4
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,347
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
because the way python 3 porting is being done is that the codebase is being made as python 3 like as possible, incrementally, to catch as many bugs as possible, before pulling the trigger. That means making as many strings unicode as possible. Exposing end users to as few regressions as possible when making the move is one of my core principles of porting. If I cannot do the port in that fashion, it simply will not happen.
kovidgoyal is offline   Reply With Quote
Old 06-12-2019, 10:10 PM   #5
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Quote:
Originally Posted by siebert View Post
I don't quite understand why it is necessary to change this in python 2, as for python 3 r"C:\Users\foo\Dropbox" works just fine.
Because writing code that works properly in both python2 and python3 has found lots of actual, legitimate bugs (some of which were always a problem even on python2), and the downsides of using unicode_literals (something which is recommended for all new code on principle) are nonexistent. I guess you could try reporting a bug against Python 2.7 recommending that it behave the same as Python 3...

"regression" does not mean "it behaves differently", it means "supported functionality was lost or became broken". This has not happened here -- your ambiguous bytestring has never had a formally documented contract of correctness, and it was arguably always wrong, so "writing python2 compatible code" has always meant doing the most correct thing and being compatible with unicode strings.

Given that unicode_literals is more helpful to people who contribute to calibre development than it is obstructionist to people who don't contribute to calibre development, it seems perfectly reasonable to carry on as-is.

But if you have suggestions backed by code, that suit your needs while at the same time fitting in with the development ideologies represented by the last 9 months of calibre's git history and the various discussions occurring on Github, then by all means, suggest away -- we shall be all ears.
eschwartz is offline   Reply With Quote
Advert
Old 06-30-2019, 07:12 AM   #6
siebert
Developer
siebert has a complete set of Star Wars action figures.siebert has a complete set of Star Wars action figures.siebert has a complete set of Star Wars action figures.
 
Posts: 155
Karma: 280
Join Date: Nov 2010
Device: Kindle 3 (Keyboard) 3G / iPad 9 WiFi / Google Pixel 6a (Android)
Quote:
Originally Posted by eschwartz View Post
"regression" does not mean "it behaves differently", it means "supported functionality was lost or became broken". This has not happened here -- your ambiguous bytestring has never had a formally documented contract of correctness, and it was arguably always wrong
I'm a bit confused about the supported statement... supported by whom? Python 2.7 supports it according to the official docs (https://docs.python.org/2/reference/...tring-literals), if you mean supported by Calibre, where would I find the "documented contracts of correctness" of Calibre regarding this case?

From my point of view, calibre "magically" changed my valid Python 2.7 raw string literal to an illegal raw unicode string literal, which breaks my recipe.
siebert is offline   Reply With Quote
Old 06-30-2019, 04:41 PM   #7
eschwartz
Ex-Helpdesk Junkie
eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.eschwartz ought to be getting tired of karma fortunes by now.
 
eschwartz's Avatar
 
Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
Quote:
Originally Posted by siebert View Post
I'm a bit confused about the supported statement... supported by whom? Python 2.7 supports it according to the official docs (https://docs.python.org/2/reference/...tring-literals),
Supported by calibre.

Quote:
if you mean supported by Calibre, where would I find the "documented contracts of correctness" of Calibre regarding this case?
There is none, that's the point. Instead, you (try to) rely on python itself, and the entire creation and evolution of Python 3.x is a rather strong testimony that this is wrong. They deprecated significant parts of the programming language semantics, en masse, just to stop people from doing this.

Aside: it might be more advisable, at least for Windows, to explicitly work with bytes and decode to utf-8. Windows paths are weird and fragile things. But I don't know why your recipe hardcodes a path to C:\ so

Quote:
From my point of view, calibre "magically" changed my valid Python 2.7 raw string literal to an illegal raw unicode string literal
Report a bug to the Python Software foundation that python2 unicode string literals don't act like python3 unicode string literals and therefore introduce behavior that surprised you.

Quote:
which breaks my recipe.
Which fixes everyone else's recipes and a lot more code besides.

I'm not sure what else you want. Clearly, no other recipes had this issue. And no one can possibly know what issues you will be having with your recipes that hardcode paths to C:\, since no one else is doing any such thing and the one person who *is* (you) is not discussing his needs or use cases, nor contributing to the ongoing development of the recipe subsystem -- so who exactly is supposed to *know* that you're having an issue?

This is the inevitable problem faced by lone wolf coders.

Last edited by eschwartz; 06-30-2019 at 04:43 PM.
eschwartz is offline   Reply With Quote
Old 06-30-2019, 06:12 PM   #8
siebert
Developer
siebert has a complete set of Star Wars action figures.siebert has a complete set of Star Wars action figures.siebert has a complete set of Star Wars action figures.
 
Posts: 155
Karma: 280
Join Date: Nov 2010
Device: Kindle 3 (Keyboard) 3G / iPad 9 WiFi / Google Pixel 6a (Android)
Quote:
Originally Posted by eschwartz View Post
Aside: it might be more advisable, at least for Windows, to explicitly work with bytes and decode to utf-8. Windows paths are weird and fragile things. But I don't know why your recipe hardcodes a path to C:\ so
And I don't know why you are so obsessed with Windows paths. A regular expression for example could have the same issue or any other raw string literal which happens to contain a "\u".

Quote:
Report a bug to the Python Software foundation that python2 unicode string literals don't act like python3 unicode string literals and therefore introduce behavior that surprised you.
I don't see how this should be a Python issue. Python didn't change my valid raw string literal into a broken unicode raw string literal, calibre did by importing unicode_literals.

Quote:
nor contributing to the ongoing development of the recipe subsystem -- so who exactly is supposed to *know* that you're having an issue?
I don't know why you think one has to contribute before one can report issues, but my first recipe included in calibre is from 2010. And anyone reading this thread is supposed to know...
siebert is offline   Reply With Quote
Old 06-30-2019, 08:34 PM   #9
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 45,347
Karma: 27182818
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Once again, strings are being changed to unicode literals throughout calibre's codebase. This is the best incremental way to minimize regressions when moving to python3.

And I will note that calibre has not used bytestrings for paths for almost a decade. The original python2 decision to use bytestrings for paths was probably inspired by linux's bone-headed decision to have paths be bags of bytes in unknown encodings. I reversed that python 2 mistake a long time ago. So using them in recipes is unsupported. You should have been using unicode literals in recipes, anyway.
kovidgoyal is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Receipe request for Calibre indianinva Recipes 1 01-14-2019 07:06 AM
Baltimore Sun Receipe Fails tlchost Recipes 5 01-29-2013 01:38 PM
Malkin Receipe Fails tlchost Recipes 0 01-18-2013 02:22 PM
Mathch a string while ignoring some character in that string? ElMiko Sigil 12 12-01-2011 10:05 PM
Calibre: TypeError:expected string or buffer Jonimeesermann Calibre 4 10-02-2010 11:40 AM


All times are GMT -4. The time now is 02:37 AM.


MobileRead.com is a privately owned, operated and funded community.