Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 12-24-2021, 03:51 PM   #1
TheRealJohnAdams
Enthusiast
TheRealJohnAdams will blow your mind, man!TheRealJohnAdams will blow your mind, man!TheRealJohnAdams will blow your mind, man!TheRealJohnAdams will blow your mind, man!TheRealJohnAdams will blow your mind, man!TheRealJohnAdams will blow your mind, man!TheRealJohnAdams will blow your mind, man!TheRealJohnAdams will blow your mind, man!TheRealJohnAdams will blow your mind, man!TheRealJohnAdams will blow your mind, man!TheRealJohnAdams will blow your mind, man!
 
Posts: 42
Karma: 57668
Join Date: May 2020
Device: Kindle Oasis, Kobo Forma, Onyx Page
Metadata Plugboards no longer preserve non-breaking spaces

Hi,
I have a Kobo Forma (and, just recently, a Kobo Libra 2 as well) and am having trouble with the Authors field. I need to replace the Authors field ampersands with commas. This is simple enough, but it cannot be done without using the metadata plugboard (either by using an inline template in the plugboard or by referencing a template column). And any plugboard operation on the "authors" field destroys non-breaking spaces, which seems to be a relatively new behavior. (It did not do this in early 2021, at least.)

For background: The Kobo does not have an author_sort field. When an author has a multi-word name (e.g. F. Scott Fitzgerald), it assumes the last word is the surname and sorts accordingly. This usually works well. But it fails for institutional authors (e.g. the United States Conference of Catholic Bishops, which is sorted as "Bishops, United ...") and for authors with suffixes (e.g. Pope Benedict XVI, which is sorted as "XVI, Pope Benedict").

The solution is to separate the offending words with no-break spaces (U+A0, which I'll represent as "_" in this post), causing the Kobo to treat several words as a single word. For example, "Pope Benedict_XVI" sorts, as it should, as "Benedict_XVI, Pope". Ditto for "United_States_Conference..." .

I have done this using the "Manage Authors" dialog. In other words, Pope Benedict XVI is stored as "Pope Benedict_XVI" in my calibre library. And when I send this metadata to my Kobo "directly," without using a metadata plugboard for the authors field, it works as expected. The Kobo receives "author: Pope Benedict_XVI" and sorts as "Benedict_XVI, Pope". However, when I use any plugboard operation on the authors field, even if it is just "source template: {authors}; destination field: authors" (which shouldn't change the output at all), the no-break space is converted to a regular space.

I'm attaching an excerpt from my debug logs. Note that unless your text editor highlights non-breaking spaces automatically, you won't see anything amiss. The key bit appears to be the following, which begins at line 250.

Code:
DEBUG:   41.0 _update_metadata: plugboard= [['{authors}', 'authors'], ['{title}', 'title']]
DEBUG:   41.0 _update_metadata: applying plugboard
DEBUG:   41.0 _update_metadata: newmi.title= Holy Week: From the Entrance into Jerusalem to the Resurrection
DEBUG:   41.0 _update_metadata: newmi.authors= ['Pope Benedict XVI']
Before these lines, calibre "understands" that the author is "Pope Benedict_XVI". But after the "applying plugboard" step, the no-break space is a regular space.

Thank you for your help with this frustrating issue.
Attached Files
File Type: txt debug_excerpt.txt (55.9 KB, 80 views)
TheRealJohnAdams is offline   Reply With Quote
Old 12-24-2021, 04:42 PM   #2
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 12,447
Karma: 8012886
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
My guess, and this is just a guess, is that the behavior started when you upgraded to calibre 5, at which point the python regular expression system changed.

Calibre has *forever* (since at least 2010) replaced sequences of spaces in template results with a single space. I think, but don't know, that python 3 includes non-breaking-spaces in the set of space characters while python 2 did not. I did not test this theory.

You can avoid removing sequences of spaces by using General Program Mode templates. For example, the template
Code:
program: $authors
doesn't remove internal non-breaking spaces. The template
Code:
{authors}
does remove them. The reason: internal spacing in GPM templates is under the control of the template writer while spacing in non-GPM templates can vary depending on template evaluation.

EDIT: Just to avoid some questions: the current behavior won't change back to what it was.
chaley is offline   Reply With Quote
Advert
Old 12-24-2021, 04:55 PM   #3
TheRealJohnAdams
Enthusiast
TheRealJohnAdams will blow your mind, man!TheRealJohnAdams will blow your mind, man!TheRealJohnAdams will blow your mind, man!TheRealJohnAdams will blow your mind, man!TheRealJohnAdams will blow your mind, man!TheRealJohnAdams will blow your mind, man!TheRealJohnAdams will blow your mind, man!TheRealJohnAdams will blow your mind, man!TheRealJohnAdams will blow your mind, man!TheRealJohnAdams will blow your mind, man!TheRealJohnAdams will blow your mind, man!
 
Posts: 42
Karma: 57668
Join Date: May 2020
Device: Kindle Oasis, Kobo Forma, Onyx Page
Quote:
Originally Posted by chaley View Post
My guess, and this is just a guess, is that the behavior started when you upgraded to calibre 5, at which point the python regular expression system changed.

Calibre has *forever* (since at least 2010) replaced sequences of spaces in template results with a single space. I think, but don't know, that python 3 includes non-breaking-spaces in the set of space characters while python 2 did not. I did not test this theory.

You can avoid removing sequences of spaces by using General Program Mode templates. For example, the template
Code:
program: $authors
doesn't remove internal non-breaking spaces. The template
Code:
{authors}
does remove them. The reason: internal spacing in GPM templates is under the control of the template writer while spacing in non-GPM templates can vary depending on template evaluation.

EDIT: Just to avoid some questions: the current behavior won't change back to what it was.
Thanks so much. Switching to program mode did fix my problem. In case anyone else comes here with the same goal I had, my "authors" plugboard template is as follows:
Code:
program: re($authors, ' &',',')
TheRealJohnAdams is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Metadata plugboards not longer working slate Library Management 13 09-11-2020 12:11 AM
Non-breaking spaces in ePUB jppeltier ePub 14 03-06-2020 05:25 AM
AZW3 to DOCX - Non-breaking Spaces retiredbiker Conversion 4 09-25-2019 10:53 PM
Non-breaking spaces exaltedwombat Sigil 15 09-16-2019 08:19 AM
Non breaking spaces? troymc Sigil 6 05-22-2010 07:47 AM


All times are GMT -4. The time now is 02:12 PM.


MobileRead.com is a privately owned, operated and funded community.