Plugin to trasform database to upper case

Xwang · 10-13-2012, 08:43 AM

Hi to all,
I've the windows portable calibre installed on an usb ntfs formatted external drive. Moreover, I access the same library from the linux version when I'm running linux. I've the need to be able to add books both in linux and windows and I alternate the use of these operating systems (windows at work, linux at home).

Since the alternating use of windows and linux creates issues given by differences in how ntfs is handled by the two OS (see PS below), I would like to create one (or more if necessary) plugin(s) to transform the existing database in an upper case one, and to maintain so on when books are added.
At the same time, the plugin(s) should avoid to create a file path longer than 256 characters.

To transform the existing database I've thought to create a plugin that, for each book in the database, changes author(s) name and title to upper case and add a specific string ('_MYTEMP') to both of them (the latter is needed to force the operating system to change file and dir name even if it is case insensitive). After the changes are saved, it will remove the specific string from names and title and save the changes again.

So I expected that at the end of the running the original file tree

Code:

Federal Aviation Administration
├── FAA Helicopter Flying Handbook - 8083-21 (292)
│** ├── cover.jpg
│** ├── FAA Helicopter Flying Handbook - 8083-21 - Federal Aviation Administration.pdf
│** └── metadata.opf
├── Pilot's Handbook of Aeronautical Knowled (291)
│** ├── cover.jpg
│** ├── metadata.opf
│** └── Pilot's Handbook of Aeronautical Knowled - Federal Aviation Administration.pdf
└── Special Federal Aviation Regulations SFA (293)
    ├── cover.jpg
    ├── metadata.opf
    └── Special Federal Aviation Regulations SFA - Federal Aviation Administration.pdf

has been changed in the following way regardless the OS in use

Code:

FEDERAL AVIATION ADMINISTRATION
├── FAA HELICOPTER FLYING HANDBOOK - 8083-21 (292)
│** ├── cover.jpg
│** ├── FAA HELICOPTER FLYING HANDBOOK - 8083-21 - FEDERAL AVIATION ADMINISTRATION.PDF
│** └── METADATA.OPF
├── PILOT'S HANDBOOK OF AERONAUTICAL KNOWLED (291)
│** ├── cover.jpg
│** ├── metadata.opf
│** └── PILOT'S HANDBOOK OF AERONAUTICAL KNOWLED - FEDERAL AVIATION ADMINISTRATION.PDF
└── SPECIAL FEDERAL AVIATION REGULATIONS SFA (293)
    ├── cover.jpg
    ├── metadata.opf
    └── SPECIAL FEDERAL AVIATION REGULATIONS SFA - FEDERAL AVIATION ADMINISTRATION.PDF

Then it would be nice to have a plugin which on save does the same to maintain the library upper case (in case this second plugin is difficult, Maybe I can modify the first one to verify if book and author is upper cased before modify them).

Obviously the plugin(s) should work on any file type.
So the (initial) questions are:
1)which type of plugin should I do? A FileTypePlugin or a MetadData one?
2)how can I loop for all the books?

Thank you,
Xwang

PS: the biggest difference is the fact that linux can create multiple files with same names with the exception of the case and such files are not visible under windows, the other problem is that windows has a maximum path name length of 256 characters which linux do not have, so I can find some books which are not readable under windows)

PS2: I prefer to have this implemented as plugin because I don't have so much time to maintain a personal source code branch which will need to be aligned to upstream version every time they are modified

JSWolf · 10-19-2012, 09:10 AM

There isn't going to be enough call for for someone to write such a plugin. It's too limited and not enough people will use it to make it worthwhile.

JimmXinu · 10-19-2012, 11:47 AM

That doesn't stop Xwang from writing one for his own use, though.

First, Xwang, have you proven that changing the author/title to uppercase like that solves your problem? You tried it manually with a smaller set of books, that is?

Assuming so, I suggest a UI plugin that searches for titles/authors with lower case and updates the metadata on command.

One place you could start is with the Extract ISBN plugin. It's the simplest plugin I know of that modifies metadata. You don't need the whole background processing part, but the technique used to update isbn can probably be adapted to update title/authors instead.

Another possible way to do it is this:

Code:

db = self.gui.current_db
bookids = db.search_getting_ids('title:"~[a-z]" or author:"~[a-z]"', None)

for bookid in bookids:
   mi = db.get_metadata(bookid,index_is_id=True)
   mi.title = mi.title.upper()
   auths=[]
   for auth in mi.authors:
      auths.append(auth.upper())
   mi.authors = auths
   db.set_metadata(bookid,mi)

db.refresh_ids(bookids)

I haven't tested it, so I doubt it would work exactly as is, but it's a starting point.

Xwang · 10-19-2012, 05:20 PM

Quote:

Originally Posted by JimmXinu

That doesn't stop Xwang from writing one for his own use, though.

First, Xwang, have you proven that changing the author/title to uppercase like that solves your problem? You tried it manually with a smaller set of books, that is?

Assuming so, I suggest a UI plugin that searches for titles/authors with lower case and updates the metadata on command.

One place you could start is with the Extract ISBN plugin. It's the simplest plugin I know of that modifies metadata. You don't need the whole background processing part, but the technique used to update isbn can probably be adapted to update title/authors instead.

Another possible way to do it is this:

Code:

db = self.gui.current_db
bookids = db.search_getting_ids('title:"~[a-z]" or author:"~[a-z]"', None)

for bookid in bookids:
   mi = db.get_metadata(bookid,index_is_id=True)
   mi.title = mi.title.upper()
   auths=[]
   for auth in mi.authors:
      auths.append(auth.upper())
   mi.authors = auths
   db.set_metadata(bookid,mi)

db.refresh_ids(bookids)

I haven't tested it, so I doubt it would work exactly as is, but it's a starting point.

First of all, thank you for your help.
I'm pretty sure that transforming and maintaining the db in upper case is sufficient to solve my problem, however it is necessary to execute it with a double step method:
firstly I've to transform in upper case titles and authors adding a special string to both; then I've to remove the special string.
I'm not a python expert but I suppose that adding the string and removing it is not a problem, so I can use your code as a base by running the for cycle twice.
I've two questions:
1) does bookid change when title or authors are changed?
2) what does "db.search_getting_ids('title:"~[a-z]" or author:"~[a-z]"', None)"
exactly do?

My idea is something like this:

Code:

db = self.gui.current_db
bookids = db.search_getting_ids('title:"~[a-z]" or author:"~[a-z]"', None)

for bookid in bookids:
   mi = db.get_metadata(bookid,index_is_id=True)
   
   mi.title = mi.title.upper()+'_T#@§'
   auths=[]
   for auth in mi.authors:
      auths.append(auth.upper()+'_T#@§')
   mi.authors = auths
   db.set_metadata(bookid,mi)

for bookid in bookids:
   mi = db.get_metadata(bookid,index_is_id=True)
   
   mi.title = mi.title[:-5]
   auths=[]
   for auth in mi.authors:
      auths.append(auth[:-5])
   mi.authors = auths
   db.set_metadata(bookid,mi)


db.refresh_ids(bookids)

This should function if bookid is not changed, otherwise I've to do a new search between the two for loops.

Xwang

JimmXinu · 10-19-2012, 05:45 PM

Glad to help.

Quote:

Originally Posted by Xwang

I've two questions:
1) does bookid change when title or authors are changed?
2) what does "db.search_getting_ids('title:"~[a-z]" or author:"~[a-z]"', None)"
exactly do?

bookid doesn't change.

"db.search_getting_ids('title:"~[a-z]" or author:"~[a-z]"', None)" searches the library, giving you back ids (rather than row numbers in the current view) for the search 'title:"~[a-z]" or author:"~[a-z]"' without a restriction(the None).

'title:"~[a-z]" or author:"~[a-z]"' is a search of two regular expressions saying 'any book with title containing letters a-z (not A-Z)' or 'any book with author(s) containing letters a-z (not A-Z)'

Rather than loop twice, you might just change the name twice in the same loop.

Xwang · 10-19-2012, 05:56 PM

Quote:

Originally Posted by JimmXinu

Glad to help.

bookid doesn't change.

"db.search_getting_ids('title:"~[a-z]" or author:"~[a-z]"', None)" searches the library, giving you back ids (rather than row numbers in the current view) for the search 'title:"~[a-z]" or author:"~[a-z]"' without a restriction(the None).

'title:"~[a-z]" or author:"~[a-z]"' is a search of two regular expressions saying 'any book with title containing letters a-z (not A-Z)' or 'any book with author(s) containing letters a-z (not A-Z)'

Rather than loop twice, you might just change the name twice in the same loop.

Code with only a loop.

Code:

db = self.gui.current_db
bookids = db.search_getting_ids('title:"~[a-z]" or author:"~[a-z]"', None)

for bookid in bookids:
   mi = db.get_metadata(bookid,index_is_id=True)
   
   mi.title = mi.title.upper()+'_T#@§'
   auths=[]
   for auth in mi.authors:
      auths.append(auth.upper()+'_T#@§')
   mi.authors = auths
   db.set_metadata(bookid,mi)
   
   mi = db.get_metadata(bookid,index_is_id=True)
   
   mi.title = mi.title[:-5]
   auths=[]
   for auth in mi.authors:
      auths.append(auth[:-5])
   mi.authors = auths
   db.set_metadata(bookid,mi)


db.refresh_ids(bookids)

Changed are saved on disk when db.set_metadata(bookid,mi) is executed?
Which module should I import?
Xwang

JimmXinu · 10-19-2012, 06:02 PM

Quote:

Originally Posted by Xwang

Changed are saved on disk when db.set_metadata(bookid,mi) is executed?
Which module should I import?
Xwang

I believe so.

As for which module to import, you need to setup a whole plugin, this is just the core snippet.

There's official documentation, but I learned even more from examining the code for existing plugins.

Xwang · 10-19-2012, 06:05 PM

Quote:

Originally Posted by JimmXinu

I believe so.

As for which module to import, you need to setup a whole plugin, this is just the core snippet.

There's official documentation, but I learned even more from examining the code for existing plugins.

Ok!
Tomorrow I'll study the ISBN extract plugin you suggested previoulsy.
Xwang

Xwang · 10-21-2012, 08:45 AM

I'm trying to have the plugin running, but when I try to import it into Calibre I obtain the following error:

Code:

Traceback (most recent call last):
  File "/usr/lib/calibre/calibre/gui2/preferences/plugins.py", line 316, in add_plugin
    self.check_for_add_to_toolbars(plugin)
  File "/usr/lib/calibre/calibre/gui2/preferences/plugins.py", line 406, in check_for_add_to_toolbars
    plugin_action = plugin.load_actual_plugin(self.gui)
  File "/usr/lib/calibre/calibre/customize/__init__.py", line 543, in load_actual_plugin
    ac = getattr(importlib.import_module(mod), cls)(gui,
AttributeError: 'module' object has no attribute 'UpperizeDBAction'

What's the problem?

Moreover, since I'm using some of the code of Extract ISBN plugin (namely the common_utils.py file), I've maintained its original copyright. Should I add something also in my code to highlight the fact that I'm using someone else code in mine?

Thank you,
Xwang

JimmXinu · 10-21-2012, 11:44 AM

Actually, that's not the first error I get:

Code:

calibre, version 0.9.3
ERROR: Unhandled exception: <b>SyntaxError</b>:invalid syntax (calibre_plugins.upperize_db.action, line 39)

Traceback (most recent call last):
  File "site-packages\calibre\gui2\preferences\plugins.py", line 316, in add_plugin
  File "site-packages\calibre\gui2\preferences\plugins.py", line 406, in check_for_add_to_toolbars
  File "site-packages\calibre\customize\__init__.py", line 543, in load_actual_plugin
  File "importlib\__init__.py", line 37, in import_module
  File "site-packages\calibre\customize\zipplugin.py", line 147, in load_module
  File "calibre_plugins.upperize_db.action", line 39
    def upperizedb(self)
                       ^
SyntaxError: invalid syntax

But that's a simple missing ':' on line 39. After fixing that, I get your error--from the GUI.

If you're running calibre as 'calibre-debug -g' from CLI (which I always do), you also see this error on the console:

Code:

Traceback (most recent call last):
  File "site-packages\calibre\gui2\ui.py", line 127, in __init__
  File "site-packages\calibre\gui2\ui.py", line 141, in init_iaction
  File "site-packages\calibre\customize\__init__.py", line 543, in load_actual_plugin
  File "importlib\__init__.py", line 37, in import_module
  File "site-packages\calibre\customize\zipplugin.py", line 150, in load_module
  File "calibre_plugins.upperize_db.action", line 23, in <module>
NameError: name 'InterfaceAction' is not defined

Now we know the real problem: You need to import InterfaceAction in action.py. It's commented out.

Xwang · 10-21-2012, 03:51 PM

Quote:

Originally Posted by JimmXinu

Actually, that's not the first error I get:

Code:

calibre, version 0.9.3
ERROR: Unhandled exception: <b>SyntaxError</b>:invalid syntax (calibre_plugins.upperize_db.action, line 39)

Traceback (most recent call last):
  File "site-packages\calibre\gui2\preferences\plugins.py", line 316, in add_plugin
  File "site-packages\calibre\gui2\preferences\plugins.py", line 406, in check_for_add_to_toolbars
  File "site-packages\calibre\customize\__init__.py", line 543, in load_actual_plugin
  File "importlib\__init__.py", line 37, in import_module
  File "site-packages\calibre\customize\zipplugin.py", line 147, in load_module
  File "calibre_plugins.upperize_db.action", line 39
    def upperizedb(self)
                       ^
SyntaxError: invalid syntax

But that's a simple missing ':' on line 39. After fixing that, I get your error--from the GUI.

If you're running calibre as 'calibre-debug -g' from CLI (which I always do), you also see this error on the console:

Code:

Traceback (most recent call last):
  File "site-packages\calibre\gui2\ui.py", line 127, in __init__
  File "site-packages\calibre\gui2\ui.py", line 141, in init_iaction
  File "site-packages\calibre\customize\__init__.py", line 543, in load_actual_plugin
  File "importlib\__init__.py", line 37, in import_module
  File "site-packages\calibre\customize\zipplugin.py", line 150, in load_module
  File "calibre_plugins.upperize_db.action", line 23, in <module>
NameError: name 'InterfaceAction' is not defined

Now we know the real problem: You need to import InterfaceAction in action.py. It's commented out.

Thank you for your help!
Now the plugin works, but I've discovered that it is necessary tu upperize also the extension.
Is there any way to access to it so that to upperize it in a manner similar to the one used for titles and authors?
Xwang

JimmXinu · 10-21-2012, 04:14 PM

Well, that is why I asked if you'd already tested doing it manually to make sure it worked...

Why do you need the extensions upcased? You described the problem as conflicts between files such as Aaa and AAA being different files on linux, but the same file on Windows.

Xwang · 10-21-2012, 04:50 PM

Quote:

Originally Posted by JimmXinu

Well, that is why I asked if you'd already tested doing it manually to make sure it worked...

Why do you need the extensions upcased? You described the problem as conflicts between files such as Aaa and AAA being different files on linux, but the same file on Windows.

I fear that the lower case extension is seen as part of the title because if I run the plugin once the db has already been upper cased, I see that it changes all the folder again. It seems that this line:

Code:

bookids = db.search_getting_ids('title:"~[a-z]" or author:"~[a-z]"', None)

return all the book in the library.

However I've searched a bit into the code and I've discovered that file extensions are forced to be lower case (see the function format_abspath in database2.py)

Code:

def format_abspath(self, index, format, index_is_id=False):
        '''
        Return absolute path to the ebook file of format `format`

        WARNING: This method will return a dummy path for a network backend DB,
        so do not rely on it, use format(..., as_path=True) instead.

        Currently used only in calibredb list, the viewer and the catalogs (via
        get_data_as_dict()).

        Apart from the viewer, I don't believe any of the others do any file
        I/O with the results of this call.
        '''
        id = index if index_is_id else self.id(index)
        try:
            name = self.format_filename_cache[id][format.upper()]
        except:
            return None
        if name:
            path = os.path.join(self.library_path, self.path(id, index_is_id=True))
            format = ('.' + format.lower()) if format else ''
            fmt_path = os.path.join(path, name+format)
            if os.path.exists(fmt_path):
                return fmt_path
            try:
                candidates = glob.glob(os.path.join(path, '*'+format))
            except: # If path contains strange characters this throws an exc
                candidates = []
            if format and candidates and os.path.exists(candidates[0]):
                try:
                    shutil.copyfile(candidates[0], fmt_path)
                except:
                    # This can happen if candidates[0] or fmt_path is too long,
                    # which can happen if the user copied the library from a
                    # non windows machine to a windows machine.
                    return None
                return fmt_path

If it is effectively so, the problem is to change the search line in my plugin to effectively return only the non already upper cased books/authors.

Moreover, making some more tests, I've discovered that if an author has more than a book, the books are correctly upper cased, but the author name remains unchanged.

Finally opening the metadata page in calibre I see a situation like the one in the attached snapshot in where author and title ordering are still not upper cased (red highlighted) (in the snapshot I've manually forced the author ordering and so noe it appears upper cased.
Xwang

JimmXinu · 10-21-2012, 04:59 PM

In addition to mi.title and mi.authors, try doing the upper steps on mi.title_sort and mi.author_sort? That might do it.

As for authors with more than one book, there's also some author metadata kept outside the books. You might something like this in addition to (or instead of) setting the authors on each book's mi object.

Code:

autid=db.get_author_id(authorname)
db.rename_author(autid, authorname.upper())

Xwang · 10-21-2012, 06:02 PM

Quote:

Originally Posted by JimmXinu

In addition to mi.title and mi.authors, try doing the upper steps on mi.title_sort and mi.author_sort? That might do it.

As for authors with more than one book, there's also some author metadata kept outside the books. You might something like this in addition to (or instead of) setting the authors on each book's mi object.

Code:

autid=db.get_author_id(authorname)
db.rename_author(autid, authorname.upper())

Well, I don't know how to thank you for your help.
I've done as you suggested and now the db is correctly upped cased.
I attach the latest version of the plugin in case you would like to have a look at it.
The only issue opened at the moment is that it continues to rename all the db if I run it twice.
To solve this issue maybe I can add an additional boolean field in the db and when a book is upper cased by the plugin, the additional value is put to yes.
The logic of the plugin should be modified to look at that value and modify a book only if its additional value is not set to yes.
I've already added the additional field in my test db with the name 'is_upper_case_db' which will be yes only if the book has already been upper cased.
The question is "how can I look for that variable to understand if I have to upper case the book?"

Xwang

10-13-2012, 08:43 AM	#1
Xwang Connoisseur Posts: 77 Karma: 2136220 Join Date: Sep 2012 Device: none	Plugin to trasform database to upper case Hi to all, I've the windows portable calibre installed on an usb ntfs formatted external drive. Moreover, I access the same library from the linux version when I'm running linux. I've the need to be able to add books both in linux and windows and I alternate the use of these operating systems (windows at work, linux at home). Since the alternating use of windows and linux creates issues given by differences in how ntfs is handled by the two OS (see PS below), I would like to create one (or more if necessary) plugin(s) to transform the existing database in an upper case one, and to maintain so on when books are added. At the same time, the plugin(s) should avoid to create a file path longer than 256 characters. To transform the existing database I've thought to create a plugin that, for each book in the database, changes author(s) name and title to upper case and add a specific string ('_MYTEMP') to both of them (the latter is needed to force the operating system to change file and dir name even if it is case insensitive). After the changes are saved, it will remove the specific string from names and title and save the changes again. So I expected that at the end of the running the original file tree Code: Federal Aviation Administration ├── FAA Helicopter Flying Handbook - 8083-21 (292) │ ├── cover.jpg │ ├── FAA Helicopter Flying Handbook - 8083-21 - Federal Aviation Administration.pdf │ └── metadata.opf ├── Pilot's Handbook of Aeronautical Knowled (291) │ ├── cover.jpg │ ├── metadata.opf │ └── Pilot's Handbook of Aeronautical Knowled - Federal Aviation Administration.pdf └── Special Federal Aviation Regulations SFA (293) ├── cover.jpg ├── metadata.opf └── Special Federal Aviation Regulations SFA - Federal Aviation Administration.pdf has been changed in the following way regardless the OS in use Code: FEDERAL AVIATION ADMINISTRATION ├── FAA HELICOPTER FLYING HANDBOOK - 8083-21 (292) │ ├── cover.jpg │ ├── FAA HELICOPTER FLYING HANDBOOK - 8083-21 - FEDERAL AVIATION ADMINISTRATION.PDF │ └── METADATA.OPF ├── PILOT'S HANDBOOK OF AERONAUTICAL KNOWLED (291) │ ├── cover.jpg │ ├── metadata.opf │ └── PILOT'S HANDBOOK OF AERONAUTICAL KNOWLED - FEDERAL AVIATION ADMINISTRATION.PDF └── SPECIAL FEDERAL AVIATION REGULATIONS SFA (293) ├── cover.jpg ├── metadata.opf └── SPECIAL FEDERAL AVIATION REGULATIONS SFA - FEDERAL AVIATION ADMINISTRATION.PDF Then it would be nice to have a plugin which on save does the same to maintain the library upper case (in case this second plugin is difficult, Maybe I can modify the first one to verify if book and author is upper cased before modify them). Obviously the plugin(s) should work on any file type. So the (initial) questions are: 1)which type of plugin should I do? A FileTypePlugin or a MetadData one? 2)how can I loop for all the books? Thank you, Xwang PS: the biggest difference is the fact that linux can create multiple files with same names with the exception of the case and such files are not visible under windows, the other problem is that windows has a maximum path name length of 256 characters which linux do not have, so I can find some books which are not readable under windows) PS2: I prefer to have this implemented as plugin because I don't have so much time to maintain a personal source code branch which will need to be aligned to upstream version every time they are modified

10-19-2012, 11:47 AM	#3
JimmXinu Plugin Developer Posts: 6,388 Karma: 3966377 Join Date: Dec 2011 Location: Midwest USA Device: Kindle Paperwhite(10th)	That doesn't stop Xwang from writing one for his own use, though. First, Xwang, have you proven that changing the author/title to uppercase like that solves your problem? You tried it manually with a smaller set of books, that is? Assuming so, I suggest a UI plugin that searches for titles/authors with lower case and updates the metadata on command. One place you could start is with the Extract ISBN plugin. It's the simplest plugin I know of that modifies metadata. You don't need the whole background processing part, but the technique used to update isbn can probably be adapted to update title/authors instead. Another possible way to do it is this: Code: db = self.gui.current_db bookids = db.search_getting_ids('title:"~[a-z]" or author:"~[a-z]"', None) for bookid in bookids: mi = db.get_metadata(bookid,index_is_id=True) mi.title = mi.title.upper() auths=[] for auth in mi.authors: auths.append(auth.upper()) mi.authors = auths db.set_metadata(bookid,mi) db.refresh_ids(bookids) I haven't tested it, so I doubt it would work exactly as is, but it's a starting point.

10-21-2012, 04:59 PM	#14
JimmXinu Plugin Developer Posts: 6,388 Karma: 3966377 Join Date: Dec 2011 Location: Midwest USA Device: Kindle Paperwhite(10th)	In addition to mi.title and mi.authors, try doing the upper steps on mi.title_sort and mi.author_sort? That might do it. As for authors with more than one book, there's also some author metadata kept outside the books. You might something like this in addition to (or instead of) setting the authors on each book's mi object. Code: autid=db.get_author_id(authorname) db.rename_author(autid, authorname.upper())

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
How to access "database" from a plugin	hakan42	Development	2	09-06-2012 05:35 PM
upper case to sentence case conversion	cybmole	Sigil	8	01-20-2011 06:03 AM
I don't like the way calibre sticks with upper-case/capital	acolsandra	Calibre	6	11-12-2010 11:17 AM
Update Metadata in database from Plugin	DokaMax	Plugins	0	05-22-2010 05:58 AM
Upper half of the screen blank	tapf!	Sony Reader	6	07-18-2008 02:49 AM

10-19-2012, 09:10 AM	#2
JSWolf Resident Curmudgeon Posts: 74,576 Karma: 129670952 Join Date: Nov 2006 Location: Roslindale, Massachusetts Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3	There isn't going to be enough call for for someone to write such a plugin. It's too limited and not enough people will use it to make it worthwhile.

10-21-2012, 04:14 PM	#12
JimmXinu Plugin Developer Posts: 6,388 Karma: 3966377 Join Date: Dec 2011 Location: Midwest USA Device: Kindle Paperwhite(10th)	Well, that is why I asked if you'd already tested doing it manually to make sure it worked... Why do you need the extensions upcased? You described the problem as conflicts between files such as Aaa and AAA being different files on linux, but the same file on Windows.