View Single Post
Old 10-23-2014, 11:59 PM   #1
kite
enturbulated
kite can tell if an avocado is ripe without touching it.kite can tell if an avocado is ripe without touching it.kite can tell if an avocado is ripe without touching it.kite can tell if an avocado is ripe without touching it.kite can tell if an avocado is ripe without touching it.kite can tell if an avocado is ripe without touching it.kite can tell if an avocado is ripe without touching it.kite can tell if an avocado is ripe without touching it.kite can tell if an avocado is ripe without touching it.kite can tell if an avocado is ripe without touching it.kite can tell if an avocado is ripe without touching it.
 
kite's Avatar
 
Posts: 30
Karma: 130494
Join Date: May 2007
Device: Kobo Aura HD
A new user's collection of Regular Expressions (regex) for 'Add books > Control the a

'Add books > Control the adding of books'

Collected by a regex illiterate.
(regex = regular expression)
In part so I can find them easily, and in part so others can find them easily.
Please point out errors, I'll try and correct them.


I've tried to show where I found each regex. But some are unknown. Though probably from Starson17.

Reading "calibre User Manual> Tutorials> All about using regular expressions in calibre" will hopefully give fellow new users a general idea of what is going on and why a particular expression doesn't work for some file names.
Post numbers 9 and 10 at "understandng the sample add books regex" https://www.mobileread.com/forums/sho...d.php?t=121353 brings it down to specifics of how a simple regex for "Author - Title.pdf" works.
If you've read the manual, looked in this thread and still can't "Control the adding of books" to your satisfaction then the library management forum regulars are very helpful. And knowledgeable. Thank Heavens.



Authorname - Booktitle.pdf
(?P<author>[^_]+) - (?P<title>.+)
works on
William Shakespeare - Let's Dance Under the Waterfall.pdf
and
First-Name3 Sur-Hyphenated-Name3 & Firstname2 Surname2 - Let's Dance Under th....e Waterfall.pdf
notes: A calibre recognised file extension (.pdf .epub .html .zip etc) is necessary for the test panel to work correctly. So "Authorname - Booktitle.epub" works but "Authorname - Booktitle" and "Authorname - Booktitle.nfo" doesn't. File extension .txt is used in all further filename examples.
Untick the checkbox next to "Read metadata from file contents rather than file name" at the top of "The Add Process" page to allow your regex to work on books added.




Booktitle - Authorname.txt
(?P<title>.+) - (?P<author>[^_]+)


Authorname. Booktitle.txt
(?P<author>[^_]+)\. (?P<title>.+)


Authorname.Booktitle.txt
(?P<author>[^_]+)\.(?P<title>.+)


Booktitle. Authorname.txt
(?P<title>[^_]+)\.(?P<author>.+)
note; "Firstname.Surname" and "Surname.Firstname" doesn't work.


Authorname AnyDotless von Fancy. Title one. T.i.t.l.e 2. Books I-III.Title Three.txt
(?P<author>.+?)\. (?P<title>.+)
works on
Dionysius of Halicarnassus. Roman Antiquities, IV. Books VI.49-VII.pdf
"It considers all characters before the first dot followed by a space as author and the rest as title"
(by JustForFun https://www.mobileread.com/forums/sho...d.php?t=246859)


Authorname - seriesname series_indexnumber - Booktitle.txt
(?P<author>[^_-]+) -?\s*(?P<series>[^_0-9-]*)(?P<series_index>[0-9]*)\s*-\s*(?P<title>[^_].+) ?
note; "Smith J.S." works as author name, but "Smith-Jones J.S." does not work as author name. See domax's post of 11-04-2015, 07:51 PM for a regex that works with double names. "Nine Moons" works as a series name, but "9 Moons" doesn't. "45.5" works as series index, "Book IV" doesn't.


Author - Seriesname series_indexnumber - Title.txt
or
Author - Title.txt
Code:
^(?P<author>((?!\s-\s).)+)\s-\s(?:(?:\[\s*)?(?P<series>.+)\s(?P<series_index>[\d\.]+)(?:\s*\])?\s-\s)?(?P<title>[^(]+)(?:\(.*\))?
works with:
Bloggs, Joe - My title
Bloggs, Joe - Some Series 1 - My title.txt
Bloggs, Joe - Some Series 1.5 - My title.txt
Bloggs, Joe - Some Series 1.5 - My title with sub-title hyphen.txt
note: "9 Moons" works as seriesname, but "IV" does not work as series_index number.
(by kiwidude https://www.mobileread.com/forums/sho...d.php?t=108792)


Authorsurname, Authorfirstname - (Series Name - Book 01) Title of the book.txt
(?P<author>[^_-]+) - (\((?P<series>[^-]+) - Book (?P<series_index>\d\d?)\) )?(?P<title>[^-]+)
(by TheEldest https://www.mobileread.com/forums/showthread.php?t=89581)


isbn.publishernamea.publishernameb.publishernamec. title1.title2.title3.title4.Month.Year.txt
Code:
(?P<isbn>\d+\w)\.(?P<publisher>\w+\.\w+\.\w+)\.(?P<title>.*)\.(?P<published>\w\w\w\.\d+)
works with
012345678X.This.Is.Publisher.This.is.a.Title.Apr.2 007.pdf
876543210x.This.Is.Publisher.A.Different.Title.Tha t.is.Longer.Jan.1997.pdf
note: "It assumes three character month abbreviations. It doesn't remove periods, except between fields. It assumes three word publisher names."
Also year must be four numerals and >1900.
(from Starson17 https://www.mobileread.com/forums/sho...metadata+regex)


isbn.jumbled dat.es t.i.t.l.e...rubbish data.txt
Code:
(?mi)^(?P<isbn>[\d\-x]{9,17})
works to grab isbn and put it into correct field so that you can download the correct metadata for the book based on the isbn.
Whole filename is put as placeholder title till correct title is downloaded.
works on
0313308316.Jumbled dates.title...ru6bish data Jaf 19m8.txt
0313306419.Greenwood.Press.Rudolfo.A.Anaya.A.Criti cal.Companion.Oct.1999.txt
01505798756X.Silly Press.The.Strange.Professional.Title.Jul.1985.epub
(from Serpentine on https://www.mobileread.com/forums/sho...a+regex&page=2)

Last edited by kite; 11-20-2015 at 12:16 AM. Reason: clearer search, correction for double names
kite is offline   Reply With Quote