Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 05-01-2009, 09:42 AM   #1
artbatista
Groupie
artbatista ought to be getting tired of karma fortunes by now.artbatista ought to be getting tired of karma fortunes by now.artbatista ought to be getting tired of karma fortunes by now.artbatista ought to be getting tired of karma fortunes by now.artbatista ought to be getting tired of karma fortunes by now.artbatista ought to be getting tired of karma fortunes by now.artbatista ought to be getting tired of karma fortunes by now.artbatista ought to be getting tired of karma fortunes by now.artbatista ought to be getting tired of karma fortunes by now.artbatista ought to be getting tired of karma fortunes by now.artbatista ought to be getting tired of karma fortunes by now.
 
artbatista's Avatar
 
Posts: 193
Karma: 1032826
Join Date: Mar 2008
Location: Miami, FL, USA
Device: iPhone 4, iPad 2
Need help with metadata by filename

My books are named as follows:

AuthorFirst AuthorLast - [Series_Name Index] - Book Tittle.EXT

Example:

Alex Archer - [Rogue Angel 03] - The Spider Stone.mobi

I have tried and I have been unable to come up with an expression that will import the series name and index when the book is imported.

Can someone here suggest a possible expression to do this job?

Thank you in advance.

Art
artbatista is offline   Reply With Quote
Old 05-01-2009, 12:54 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,779
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
(?P<author>.+?) - \[(?P<series>.+?) (?P<series_index>[0-9]+)\] - (?P<title>.+)
kovidgoyal is offline   Reply With Quote
Old 05-11-2009, 12:07 AM   #3
artbatista
Groupie
artbatista ought to be getting tired of karma fortunes by now.artbatista ought to be getting tired of karma fortunes by now.artbatista ought to be getting tired of karma fortunes by now.artbatista ought to be getting tired of karma fortunes by now.artbatista ought to be getting tired of karma fortunes by now.artbatista ought to be getting tired of karma fortunes by now.artbatista ought to be getting tired of karma fortunes by now.artbatista ought to be getting tired of karma fortunes by now.artbatista ought to be getting tired of karma fortunes by now.artbatista ought to be getting tired of karma fortunes by now.artbatista ought to be getting tired of karma fortunes by now.
 
artbatista's Avatar
 
Posts: 193
Karma: 1032826
Join Date: Mar 2008
Location: Miami, FL, USA
Device: iPhone 4, iPad 2
Quote:
Originally Posted by kovidgoyal View Post
(?P<author>.+?) - \[(?P<series>.+?) (?P<series_index>[0-9]+)\] - (?P<title>.+)

Thank you! that works.

Art
artbatista is offline   Reply With Quote
Old 08-12-2009, 08:32 PM   #4
oldcrow74
Member
oldcrow74 began at the beginning.
 
Posts: 16
Karma: 10
Join Date: May 2009
Device: sony prs-700bc
I have a similar need. I have filenames that have multiple hyphens. I want everything after the first hyphen to be considered the title. When I use this string:

(?P<author>.+) - (?P<title>[^_]+)

everything after the rightmost hyphen is considered the title. I need it to be the leftmost.

Thanks.
Bob
oldcrow74 is offline   Reply With Quote
Old 08-12-2009, 10:26 PM   #5
ilovejedd
hopeless n00b
ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.
 
ilovejedd's Avatar
 
Posts: 5,111
Karma: 19597086
Join Date: Jan 2009
Location: in the middle of nowhere
Device: PW4, PW3, Libra H2O, iPad 10.5, iPad 11, iPad 12.9
The regular expression kovid posted should work fine, too. Just need to enclose the series info and one of the hyphens inside a ()?
ilovejedd is offline   Reply With Quote
Old 08-12-2009, 10:39 PM   #6
oldcrow74
Member
oldcrow74 began at the beginning.
 
Posts: 16
Karma: 10
Join Date: May 2009
Device: sony prs-700bc
Quote:
Originally Posted by ilovejedd View Post
The regular expression kovid posted should work fine, too. Just need to enclose the series info and one of the hyphens inside a ()?
Could you spell that out for me? I have no idea where you want me to put the ()?
oldcrow74 is offline   Reply With Quote
Old 08-12-2009, 11:28 PM   #7
ilovejedd
hopeless n00b
ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.ilovejedd ought to be getting tired of karma fortunes by now.
 
ilovejedd's Avatar
 
Posts: 5,111
Karma: 19597086
Join Date: Jan 2009
Location: in the middle of nowhere
Device: PW4, PW3, Libra H2O, iPad 10.5, iPad 11, iPad 12.9
Either

(?P<author>.+?) - (\[(?P<series>.+?) (?P<series_index>[0-9]+)\] - )?(?P<title>.+)

or

(?P<author>.+?)( - (\[(?P<series>.+?) (?P<series_index>[0-9]+)\])? - (?P<title>.+)

would work. If you don't have any filenames with series information, then the following might be simpler:

(?P<author>.+?) - (?P<title>[^_]+)
ilovejedd is offline   Reply With Quote
Old 08-13-2009, 12:11 PM   #8
oldcrow74
Member
oldcrow74 began at the beginning.
 
Posts: 16
Karma: 10
Join Date: May 2009
Device: sony prs-700bc
Quote:
Originally Posted by ilovejedd View Post
Either

(?P<author>.+?) - (\[(?P<series>.+?) (?P<series_index>[0-9]+)\] - )?(?P<title>.+)

or

(?P<author>.+?)( - (\[(?P<series>.+?) (?P<series_index>[0-9]+)\])? - (?P<title>.+)

would work. If you don't have any filenames with series information, then the following might be simpler:

(?P<author>.+?) - (?P<title>[^_]+)
No, none of these work. The first 2 don't parse out the author and title at all. The "simple" one is close, but always truncates the last word of the title.

Maybe I'm not making myself clear. This should be a very simple string operation. Except, apparently, in Python. I want everything to the left of the leftmost hyphen to be the author. Everything to the right of the leftmost hyphen, including other hyphens, is the title. For example, if the filename is

Aaaaaaa, Bbbbb - Qqqqqqq Rrrrr - Sssssssss Ttttttttt.pdf


the author is Aaaaaaa, Bbbbb

the title is Qqqqqqq Rrrrr - Sssssssss Ttttttttt

Thanks again,
Bob
oldcrow74 is offline   Reply With Quote
Old 08-13-2009, 12:21 PM   #9
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,779
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Code:
(?<author>[^-]+?) - (?<title>.+)
kovidgoyal is offline   Reply With Quote
Old 08-13-2009, 12:23 PM   #10
Jellby
frumious Bandersnatch
Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.Jellby ought to be getting tired of karma fortunes by now.
 
Jellby's Avatar
 
Posts: 7,514
Karma: 18512745
Join Date: Jan 2008
Location: Spaniard in Sweden
Device: Cybook Orizon, Kobo Aura
I think the problem is this (from the regular expressions reference):

The '*', '+', and '?' qualifiers are all greedy; they match as much text as possible. Sometimes this behaviour isn’t desired; if the RE <.*> is matched against '<H1>title</H1>', it will match the entire string, and not just '<H1>'. Adding '?' after the qualifier makes it perform the match in non-greedy or minimal fashion; as few characters as possible will be matched. Using .*? in the previous expression will match only '<H1>'.

So, just use:

(?P<author>[^_]+?) - (?P<title>.+)
Jellby is offline   Reply With Quote
Old 08-13-2009, 12:45 PM   #11
oldcrow74
Member
oldcrow74 began at the beginning.
 
Posts: 16
Karma: 10
Join Date: May 2009
Device: sony prs-700bc
Quote:
Originally Posted by Jellby View Post
I think the problem is this (from the regular expressions reference):

The '*', '+', and '?' qualifiers are all greedy; they match as much text as possible. Sometimes this behaviour isn’t desired; if the RE <.*> is matched against '<H1>title</H1>', it will match the entire string, and not just '<H1>'. Adding '?' after the qualifier makes it perform the match in non-greedy or minimal fashion; as few characters as possible will be matched. Using .*? in the previous expression will match only '<H1>'.

So, just use:

(?P<author>[^_]+?) - (?P<title>.+)
My thanks and apologies. The expression

(?P<author>.+?) - (\[(?P<series>.+?) (?P<series_index>[0-9]+)\] - )?(?P<title>.+)

did work after all. I must've missed a character when I copied and pasted. Thank you.

However, the expression

(?P<author>[^_]+?) - (?P<title>.+)

does not work. It still drops the last word of the title.

Thanks again. You guys rule.
oldcrow74 is offline   Reply With Quote
Old 08-13-2009, 04:29 PM   #12
oldcrow74
Member
oldcrow74 began at the beginning.
 
Posts: 16
Karma: 10
Join Date: May 2009
Device: sony prs-700bc
Quote:
Originally Posted by oldcrow74 View Post
My thanks and apologies. The expression

(?P<author>.+?) - (\[(?P<series>.+?) (?P<series_index>[0-9]+)\] - )?(?P<title>.+)

did work after all. I must've missed a character when I copied and pasted. Thank you.

However, the expression

(?P<author>[^_]+?) - (?P<title>.+)

does not work. It still drops the last word of the title.

Thanks again. You guys rule.
An update. I actually found that this expression was still buggy for my purposes. I tried taking out the series stuff as such:

(?P<author>.+?) - (\[\] - )?(?P<title>.+)

and now it seems to work exactly the way I wanted it.

One thing I've noticed is that an expression may work when you test it in the Preferences/Advanced dialog, but work differently in real life. The expression above that I thought worked perfectly, only worked in test mode. It still garbled some of my file names when I added files to the library. I would say there's a bug somewhere.
oldcrow74 is offline   Reply With Quote
Old 08-14-2009, 12:19 AM   #13
Dopedangel
Wizard
Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.Dopedangel ought to be getting tired of karma fortunes by now.
 
Dopedangel's Avatar
 
Posts: 1,759
Karma: 30063305
Join Date: Dec 2006
Location: Singapore
Device: Boyue
I used booksorter from here
http://iterati.org/ebookTools/BookSorter/Default.aspx
to rename all my files
author - series 00 - Title
it made calibre more accurate as it removed all the unwanted details from the filenames
Dopedangel is offline   Reply With Quote
Old 08-14-2009, 05:38 AM   #14
oldcrow74
Member
oldcrow74 began at the beginning.
 
Posts: 16
Karma: 10
Join Date: May 2009
Device: sony prs-700bc
Quote:
Originally Posted by Dopedangel View Post
I used booksorter from here
http://iterati.org/ebookTools/BookSorter/Default.aspx
to rename all my files
author - series 00 - Title
it made calibre more accurate as it removed all the unwanted details from the filenames
But the point is that I don't want to change the filename, nor should I need to. This is a very simple string handling requirment. Calibre shouldn't be written in such a way, or using such a horribly arcane "language" such as Python, the we need to externally massage the data to get standardized filenames to be parsed correctly.
oldcrow74 is offline   Reply With Quote
Old 08-14-2009, 07:06 AM   #15
markbond1007
Connoisseur
markbond1007 doesn't littermarkbond1007 doesn't litter
 
Posts: 57
Karma: 122
Join Date: Jul 2008
Device: CyBook Gen3, Sony PRS-600
Python is hardly arcane, and these recipes are not exactly Python anyway, they are regular expressions that are basically global across many programming languages (PHP, Perl all spring to mind)

Mark
markbond1007 is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Regex and Metadata from filename. asrrin29 Calibre 5 12-03-2023 04:51 AM
Metadata in Title/filename mezme Calibre 0 08-18-2010 03:08 AM
Metadata Filename Syntax gandor62 Calibre 15 07-18-2010 03:46 AM
Little Help with Metadata from Filename needed plunderydoo Calibre 4 09-06-2009 08:34 AM
Metadata from filename problem kad032000 Calibre 0 05-24-2009 02:26 AM


All times are GMT -4. The time now is 02:12 AM.


MobileRead.com is a privately owned, operated and funded community.