Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Library Management

Notices

Reply
 
Thread Tools Search this Thread
Old 01-21-2012, 01:23 PM   #1
kamanza
Zealot
kamanza began at the beginning.
 
Posts: 115
Karma: 10
Join Date: Jan 2011
Device: none
Help with regex needed.

I'm trying to write a regex for adding books that would correctly assign categories to hyphenated
and non-hyphenated author names series & book titles, where hyphens inside the names would not
have spaces around them e.g.:


author name - series name index - book title ;
author-name - series name index - book title;
author-name - series name index - book-title;
author-name - series-name index - book-title;
author-name - book-title;


and any other possible combinations.
So far i've managed to deal with the title & series part:

Code:

(?P<author>[^_-]+) -?\s*(?P<series>[^_0-9]*)(?P<series_index>[0-9]*)\s*-\s+(?P<title>[^_].+) ?


But if there is a hyphen in author-name, everything before the hyphen disappears.
being a newbie, i think i've reached the limit of my abilities for the moment, so help would be very
. much appreciated.


N. B. Just noticed a mistake in series part: the second hyphen in "[^_0-9-]" . Corrected.

Last edited by kamanza; 01-21-2012 at 07:20 PM.
kamanza is offline   Reply With Quote
Old 01-23-2012, 08:12 PM   #2
puterdude
Junior Member
puterdude began at the beginning.
 
Posts: 2
Karma: 10
Join Date: Dec 2011
Device: kindle
Question Problem with Author - Series # - Title Regex

Hello,

Any help solving this problem would be greatly appreciated -- I've been pulling out my gray hair for the last few days.


When I used the above regex and others I have found on this forum:

1. (?P<author>.+?) - (?:\[(?P<series>.+?) ?(?P<series_index>[\d\.]{1,4})?\]) - (?P<title>.+)

2. ^((?P<author>([^\_0-9]+)(?=\s*-\s*)(?!\s*-\s*[0-9.]+)|\b))(\s*-\s*)?((?P<series>[^0-9\-]+) ([-#] ?)?(?P<series_index>[0-9.]+)?\s*-\s*)?(?P<title>.+)

3. (?P<author>[a-zA-Z'. ]+?) - \[?((?P<series>[a-zA-Z' ]+) (?P<series_index>[0-9\.]+)\]? - )?(?P<title>[^\.]+).*


I have the following results (using 'Piers Anthony - Incarnations of Immortality 05 - Being a Green Mother.epub' as an EXAMPLE):

Attachment 1. -- Add dialog (with DEFAULT settings) with regex test -- results display correct parsing.

Attachment 2. -- any series ebook added into calibre with INCORRECT results, i.e. series preceded title and series NOT inserted into series field.


I've tried this with several different regex using the lastest version of calibre (0.8.36) and have the same incorrect results.

Could I have some incorrect setting in calibre?

TIA!!
Attached Thumbnails
Click image for larger version

Name:	Add_Dialog.png
Views:	284
Size:	18.5 KB
ID:	81635   Click image for larger version

Name:	1-23-2012 6-02-50 PM.jpg
Views:	293
Size:	18.0 KB
ID:	81636  

Last edited by puterdude; 01-23-2012 at 10:07 PM. Reason: Emoticons do not display correctly
puterdude is offline   Reply With Quote
Advert
Old 01-24-2012, 07:11 AM   #3
kamanza
Zealot
kamanza began at the beginning.
 
Posts: 115
Karma: 10
Join Date: Jan 2011
Device: none
I was annoyed by having to correct manually every time there was a hyphen in the author's name or in the title, so i continued tinkering.
I've figured it a bit more:

code

(?P<author>.*?)( -\s*(?P<series>[^_0-9]*)(?P<series_index>[0-9]*))? -\s*(?P<title>[^_].+) ?

It takes care of cases with or without hyphens in title, series & authors, with or without series.
The remaining problem is that if there is no author, series goes into the author slot.
But that is a rare occurrence, so i consider the problem mainly solved.
kamanza is offline   Reply With Quote
Old 01-24-2012, 07:27 AM   #4
kamanza
Zealot
kamanza began at the beginning.
 
Posts: 115
Karma: 10
Join Date: Jan 2011
Device: none
Problem with Author - Series # - Title Regex

Hey, puterdude, what are you trying to achieve?
If it is simple "authors" - "series" "index" - "title", you can try the original regex included in calibre ( that was my point of departure) or use mine, slightly improved one (i hope).
kamanza is offline   Reply With Quote
Reply

Tags
add books, regex


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
RegEx Help needed ghostyjack Sigil 14 11-02-2011 10:22 AM
Creating Proper TOC in Kindle - regex help needed lyric Conversion 1 10-17-2011 06:19 PM
Chapter detection when only digits - regex needed Perkin Calibre 15 09-20-2010 06:25 PM
RegEx REPLACEMENT: Help needed! LARdT Sigil 12 01-04-2010 07:25 PM
Regex help needed gandor62 Calibre 2 11-04-2009 10:27 AM


All times are GMT -4. The time now is 01:23 AM.


MobileRead.com is a privately owned, operated and funded community.