Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 09-16-2010, 01:40 AM   #1
gwk
Junior Member
gwk began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Sep 2010
Device: irex iliad, htc hd2
Cleaning metadata (titles and authors)

Hi,

Upon importing large amounts of books in Calibre, I get to a point where I would like to clean titles and authors, examples are:
- replace underscores by spaces
- remove series information from the title
- remove author information from the title

I know about the regular expression thing in the import, but that would require to export first and re-import again. More over, it does not help when importing metadata within the ebook itself.

Is within calibre anything available to search and replace within titles.
gwk is offline   Reply With Quote
Old 09-16-2010, 09:00 AM   #2
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by gwk View Post
Is within calibre anything available to search and replace within titles.
No.

You can regex search on your titles and display only those that need fixing. It's possible to write chunks of Python code to do what you want and then execute that code from the command line with access to the full Calibre set of library tools, but that's probably a lot harder than just finding/fixing.
Starson17 is offline   Reply With Quote
Old 09-16-2010, 11:00 AM   #3
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,804
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by gwk View Post
Hi,

Upon importing large amounts of books in Calibre, I get to a point where I would like to clean titles and authors, examples are:
- replace underscores by spaces
- remove series information from the title
- remove author information from the title

I know about the regular expression thing in the import, but that would require to export first and re-import again. More over, it does not help when importing metadata within the ebook itself.

Is within calibre anything available to search and replace within titles.
Works for any field Except Title:

Sort your book list (click the column) as needed.
Select the messed up or non-normalised records.
Click edit Meta-data. fix the value once.

For singl field, single books, there is a "Right-click" set "Case" to: tool.
Remembe any changes when in "Bulk" mode apply to ALL selected records.
theducks is offline   Reply With Quote
Old 09-16-2010, 11:14 AM   #4
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by theducks View Post
  • Sort your book list (click the column) as needed.
  • Select the messed up or non-normalised records.
  • Click edit Meta-data. fix the value once.
In steps 1 and 2, the regex search can be very useful, if, for example, you want all titles that have an underscore or open paren use:

title:"~[_\(]"

Surely, your messed up titles with underscores, series and authors will have some characteristic you can find most of them with, even if it's just space-hyphen-space.
(I know: Don't call me Shirley!)
Starson17 is offline   Reply With Quote
Old 09-16-2010, 11:19 AM   #5
theducks
Well trained by Cats
theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.theducks ought to be getting tired of karma fortunes by now.
 
theducks's Avatar
 
Posts: 29,804
Karma: 54830978
Join Date: Aug 2009
Location: The Central Coast of California
Device: Kobo Libra2,Kobo Aura2v1, K4NT(Fixed: New Bat.), Galaxy Tab A
Quote:
Originally Posted by Starson17 View Post
In steps 1 and 2, the regex search can be very useful, if, for example, you want all titles that have an underscore or open paren use:

title:"~[_\(]"

Surely, your messed up titles with underscores, series and authors will have some characteristic you can find most of them with, even if it's just space-hyphen-space.
(I know: Don't call me Shirley!)
Airplane.. My kind of humor

Doesn't the Regex method only work While importing? Not after the damage has been done?
theducks is offline   Reply With Quote
Old 09-16-2010, 12:08 PM   #6
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 11,742
Karma: 6997045
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Quote:
Originally Posted by theducks View Post
Doesn't the Regex method only work While importing? Not after the damage has been done?
You can use regexps in searches in the gui. That is what starson17 was indicating when he wrote title:"~[_\(]". Put that string into the search box and you will get a list of titles containing underscores or open parentheses.
chaley is offline   Reply With Quote
Old 09-16-2010, 12:12 PM   #7
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by theducks View Post
Airplane.. My kind of humor

Doesn't the Regex method only work While importing? Not after the damage has been done?
Regex can be used in several places. Charles built regex into the search system. It will help find the problem files when they are otherwise hard to find. For example, you could look for titles that have numbers, or titles that have parentheses or brackets, or a hyphen surrounded by spaces, etc.

You'd still need to select them all and individually edit the title, but bulk select and individual edit opens them singly in sequence, which is a lot faster.
Starson17 is offline   Reply With Quote
Old 09-16-2010, 12:16 PM   #8
speakingtohe
Wizard
speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.
 
Posts: 4,812
Karma: 26912940
Join Date: Apr 2010
Device: sony PRS-T1 and T3, Kobo Mini and Aura HD, Tablet
I did most of my author editing in versions before 7.2 when it was much easier to spot author discrepancies in the tags column authors and just bulk edit.
Most common author errors were commas due to author sort in author field, and spacing in initials (ie C.H. and C. H.)

I think Starson17's method would work for the commas etc. anyway.
Typing
author:"~[,]"
in the search bar will return all author with commas
Won't fix them but makes them easier to bulk edit etc.

Thank you Starson17 for that example

I found custom columns very helpful
I have one yes/no column and once comment column
I checked a screen of books for various factors such as flow and page numbers, formatting etc. and bulk edited them as much as possible to refelect column and then used bulk edit to indicate they were done. SAved me a lot of pain.

I seem to remember someone saying copy to library could use regex's to modify title and authornames but could be wrong.
speakingtohe is offline   Reply With Quote
Old 09-16-2010, 12:21 PM   #9
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by speakingtohe View Post
Thank you Starson17 for that example
You're welcome.

Quote:
I seem to remember someone saying copy to library could use regex's to modify title and authornames but could be wrong.
I've worked on that code and I don't think it will. What it can do is ignore certain things, like punctuation, leading indefinite articles and excess spaces in the title when comparing a new book to an existing entry in the library (with the autosort preference on). If there's a title/author match, the new book formats get dropped into the existing book record, if possible.
Starson17 is offline   Reply With Quote
Old 09-16-2010, 04:02 PM   #10
gwk
Junior Member
gwk began at the beginning.
 
Posts: 8
Karma: 10
Join Date: Sep 2010
Device: irex iliad, htc hd2
Okay, using single-edit for multiple books will save me a lot of clicks.
Thanks for the suggestion.

But you still have to remove the underscores book after book by hand.
The only other alternative is to export, correct the situation, and import.

But it seems to me that adding a replace (and possible also a regexp replace) for the title to the bulk-edit dialog box would make calibre far more powerful. Not sure how difficult this would be.

The last alternative is of-course the python script, does anybody have an example?
gwk is offline   Reply With Quote
Old 09-16-2010, 04:40 PM   #11
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 11,742
Karma: 6997045
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Quote:
Originally Posted by gwk View Post
But it seems to me that adding a replace (and possible also a regexp replace) for the title to the bulk-edit dialog box would make calibre far more powerful. Not sure how difficult this would be.
<pontification>
From a development standpoint, it wouldn't be hard to provide a facility that permits regexp matching and replacement for an arbitrary field. Use the (python) force, Luke! Python already has a facility for regexp search and replace, and all we would need to do is give a user a way to enter the strings. All you would need to do is understand python backreference syntax and semantics, and voila! you have it.

The problem is not really the development time. It is, instead, how difficult the general case is, coupled with the inevitable need for support that such a feature would raise. Perhaps we can hire help desk people with the money we make from license fees. Oh....

Three examples:
1) The vast majority of calibre users have no clue what regular expressions are, much less what backreferences are. Where will the documentation and tutorials come from? What about support? Examples?
2) People would scream for an 'undo bulk edit' function, because the regexp turned all their fields into happy faces.
2) This feature is by necessity a 'power user' feature, but that (rightly) won't stop people from asking for help in solving a particular problem. Those explanation cycles must come from somewhere.

Making something 'easy to use' and 'intuitive' is extremely hard and time consuming. Choices need to be made between doing nothing, doing something that can be done in the time available, or doing it right. My opinion: the more calibre grows in popularity, the more we must choose to do nothing, because the middle path leads to perdition.
</pontification>
chaley is offline   Reply With Quote
Old 09-16-2010, 04:47 PM   #12
speakingtohe
Wizard
speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.
 
Posts: 4,812
Karma: 26912940
Join Date: Apr 2010
Device: sony PRS-T1 and T3, Kobo Mini and Aura HD, Tablet
I would just use Starson17's example search term or a variation thereof and do it in the Gui.

A tad tedious if you have several thousand books and are in a hurry, but I just tried it and 10 books took less than 20 seconds even with the rather spongy Gui editing on my poor sad duo core laptop.

Even at 1000 or more books, doing it the 'easy' way might take much more time and effort.
Helen
speakingtohe is offline   Reply With Quote
Old 09-16-2010, 04:51 PM   #13
speakingtohe
Wizard
speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.
 
Posts: 4,812
Karma: 26912940
Join Date: Apr 2010
Device: sony PRS-T1 and T3, Kobo Mini and Aura HD, Tablet
Quote:
2) People would scream for an 'undo bulk edit' function, because the regexp turned all their fields into happy faces.
My neighbours complained when I screamed so I sucked it up and put on a happy face myself.
Once my face unfroze from the screaming.
speakingtohe is offline   Reply With Quote
Old 09-17-2010, 09:50 AM   #14
pckopp
Enthusiast
pckopp began at the beginning.
 
Posts: 32
Karma: 44
Join Date: Jul 2010
Location: Seneca, SC
Device: Kindle, eReader
I'm one of the users for whom RE is a barely understood concept, so can I just ask a question related to adding books?

My current expression: (?P<title>.+) - (?P<author>[^_]+)

works great. But sometimes I have a filename that includes the series info at the beginning in brackets. Like this:

[Instrumentality Of Mankind] Golden the Ship Was Oh! Oh! Oh! - Cordwainer Smith.epub

I would like the text between the brackets to be in the Series box.

Thanks!
pckopp is offline   Reply With Quote
Old 09-17-2010, 10:02 AM   #15
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 11,742
Karma: 6997045
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Teaser ...

I de-pontificated myself and have been playing with bulk edit. Something like this might actually appear ...
Attached Thumbnails
Click image for larger version

Name:	Clipboard03.jpg
Views:	538
Size:	86.5 KB
ID:	58318  
chaley is offline   Reply With Quote
Reply

Tags
metadata, replace, search


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
CALIBRE WILL NOT RECOGNISE TITLES OR AUTHORS D.. Calibre 5 09-14-2010 09:33 PM
Classic PDF titles and authors on nook? slothrop Barnes & Noble NOOK 2 12-09-2009 09:23 PM
Issues Editing Titles & Authors kmvargo Calibre 0 07-05-2009 12:43 AM
Authors/Titles different font sizes bunjibear Sony Reader 6 03-15-2009 10:41 AM
Changes in Titles/Authors Not Shown Ralob Bookeen 20 04-07-2008 08:16 AM


All times are GMT -4. The time now is 12:02 PM.


MobileRead.com is a privately owned, operated and funded community.