Tips on Bulk Edit Metadata

deschiff · 10-14-2010, 02:47 PM

Hi all;
I've loaded the contents of the Project Gutenberg 042010-DVD, over 30,000 files, into a Calibre Library. Now I have PG#'s in the title column. I want to keep the PG ID's in a custom column and get the title and author from the text or html. Any tips, tricks, or suggestions would be greatly appreciated.

Manichean · 10-14-2010, 05:02 PM

Build your custom column, use search & replace to get everything from the title column into your custom column. Done.

Edit: I ought to elaborate, I think. Use regular expression search mode, title field as source, your custom column as target. Search pattern is

Code:

(.*)

replacement pattern is

Code:

\1

deschiff · 10-15-2010, 09:06 AM

Thanks Manichean !
That worked like a charm! I was trying to do it with out regular expressions so I didn't see the extra field to have an output other than the source field.

If anyone can advise me as to the best scripting language to learn so I get the title and author from the txt and html files I would appreciate it greatly.
Are sed and grep the best way or should I invest the time to learn python?

Manichean · 10-15-2010, 09:18 AM

I don't know much about scripting languages, but I'm currently learning Python. It's a great language for non- timecritical tasks, I think. If you know a way to do it in a shell script, I'd suggest using that, unless you really want to learn a script language.

chaley · 10-15-2010, 09:32 AM

If you feel up to it, you might consider perl. Text matching and regular expressions are integral to the language, making it good for text hacking.

Python has the advantage of being like 'normal' languages such as C or Java, bug with some nice string manipulation thrown in. It wouldn't be my first choice for hacking text, but it isn't a bad choice by any means. Python also has the advantage of being able to use calibre's libraries, so you could directly set the fields in the database. That by itself could make it the best choice around.

sed/grep/awk/etc can work too, if you can work out the patterns and chaining. This option would be best if the title & author information is easy to locate (standard bracketing text). You probably would need to generate a mess of calibre command line scripts as output, but that is also true for perl.

Have fun.

deschiff · 10-15-2010, 10:54 AM

Thank for fast replies.
Looks like it will be worth my time to learn python, I saw perl extension libs for python if worse comes worst.

Thanks again.
Dave.

10-14-2010, 02:47 PM	#1
deschiff Member Posts: 12 Karma: 10 Join Date: May 2009 Location: Minneapolis, MN. Device: Bebook mini	Tips on Bulk Edit Metadata Hi all; I've loaded the contents of the Project Gutenberg 042010-DVD, over 30,000 files, into a Calibre Library. Now I have PG#'s in the title column. I want to keep the PG ID's in a custom column and get the title and author from the text or html. Any tips, tricks, or suggestions would be greatly appreciated.

10-14-2010, 05:02 PM	#2
Manichean Wizard Posts: 3,130 Karma: 91256 Join Date: Feb 2008 Location: Germany Device: Cybook Gen3	Build your custom column, use search & replace to get everything from the title column into your custom column. Done. Edit: I ought to elaborate, I think. Use regular expression search mode, title field as source, your custom column as target. Search pattern is Code: (.) replacement pattern is Code: \1 Last edited by Manichean; 10-14-2010 at 05:08 PM.*

Similar Threads
Thread	Thread Starter	Forum	Replies	Last Post
Minor bug: tab order on bulk edit dialog in 0.7.23	kiwidude	Calibre	1	10-11-2010 11:45 AM
Suggestion: Remove all tags button in the bulk edit screen	Daemon	Calibre	3	08-23-2010 06:58 AM
Updating Metadata in Bulk	Turt99	Calibre	5	06-07-2010 03:19 PM
Bulk edit - how to set the rating to 0 stars?	highwaykind	Calibre	3	02-01-2010 01:17 PM
metadata in bulk	Lorraine Froggy	Calibre	1	11-14-2009 09:42 PM

10-15-2010, 09:06 AM	#3
deschiff Member Posts: 12 Karma: 10 Join Date: May 2009 Location: Minneapolis, MN. Device: Bebook mini	Thanks Manichean ! That worked like a charm! I was trying to do it with out regular expressions so I didn't see the extra field to have an output other than the source field. If anyone can advise me as to the best scripting language to learn so I get the title and author from the txt and html files I would appreciate it greatly. Are sed and grep the best way or should I invest the time to learn python?

10-15-2010, 09:18 AM	#4
Manichean Wizard Posts: 3,130 Karma: 91256 Join Date: Feb 2008 Location: Germany Device: Cybook Gen3	I don't know much about scripting languages, but I'm currently learning Python. It's a great language for non- timecritical tasks, I think. If you know a way to do it in a shell script, I'd suggest using that, unless you really want to learn a script language.

10-15-2010, 09:32 AM	#5
chaley Grand Sorcerer Posts: 11,741 Karma: 6997045 Join Date: Jan 2010 Location: Notts, England Device: Kobo Libra 2	If you feel up to it, you might consider perl. Text matching and regular expressions are integral to the language, making it good for text hacking. Python has the advantage of being like 'normal' languages such as C or Java, bug with some nice string manipulation thrown in. It wouldn't be my first choice for hacking text, but it isn't a bad choice by any means. Python also has the advantage of being able to use calibre's libraries, so you could directly set the fields in the database. That by itself could make it the best choice around. sed/grep/awk/etc can work too, if you can work out the patterns and chaining. This option would be best if the title & author information is easy to locate (standard bracketing text). You probably would need to generate a mess of calibre command line scripts as output, but that is also true for perl. Have fun.

10-15-2010, 10:54 AM	#6
deschiff Member Posts: 12 Karma: 10 Join Date: May 2009 Location: Minneapolis, MN. Device: Bebook mini	Thank for fast replies. Looks like it will be worth my time to learn python, I saw perl extension libs for python if worse comes worst. Thanks again. Dave.

Advert

Advert