01-21-2012, 01:23 PM | #1 |
Zealot
Posts: 115
Karma: 10
Join Date: Jan 2011
Device: none
|
Help with regex needed.
I'm trying to write a regex for adding books that would correctly assign categories to hyphenated
and non-hyphenated author names series & book titles, where hyphens inside the names would not have spaces around them e.g.: author name - series name index - book title ; author-name - series name index - book title; author-name - series name index - book-title; author-name - series-name index - book-title; author-name - book-title; and any other possible combinations. So far i've managed to deal with the title & series part: Code: (?P<author>[^_-]+) -?\s*(?P<series>[^_0-9]*)(?P<series_index>[0-9]*)\s*-\s+(?P<title>[^_].+) ? But if there is a hyphen in author-name, everything before the hyphen disappears. being a newbie, i think i've reached the limit of my abilities for the moment, so help would be very . much appreciated. N. B. Just noticed a mistake in series part: the second hyphen in "[^_0-9-]" . Corrected. Last edited by kamanza; 01-21-2012 at 07:20 PM. |
01-23-2012, 08:12 PM | #2 |
Junior Member
Posts: 2
Karma: 10
Join Date: Dec 2011
Device: kindle
|
Problem with Author - Series # - Title Regex
Hello,
Any help solving this problem would be greatly appreciated -- I've been pulling out my gray hair for the last few days. When I used the above regex and others I have found on this forum: 1. (?P<author>.+?) - (?:\[(?P<series>.+?) ?(?P<series_index>[\d\.]{1,4})?\]) - (?P<title>.+) 2. ^((?P<author>([^\_0-9]+)(?=\s*-\s*)(?!\s*-\s*[0-9.]+)|\b))(\s*-\s*)?((?P<series>[^0-9\-]+) ([-#] ?)?(?P<series_index>[0-9.]+)?\s*-\s*)?(?P<title>.+) 3. (?P<author>[a-zA-Z'. ]+?) - \[?((?P<series>[a-zA-Z' ]+) (?P<series_index>[0-9\.]+)\]? - )?(?P<title>[^\.]+).* I have the following results (using 'Piers Anthony - Incarnations of Immortality 05 - Being a Green Mother.epub' as an EXAMPLE): Attachment 1. -- Add dialog (with DEFAULT settings) with regex test -- results display correct parsing. Attachment 2. -- any series ebook added into calibre with INCORRECT results, i.e. series preceded title and series NOT inserted into series field. I've tried this with several different regex using the lastest version of calibre (0.8.36) and have the same incorrect results. Could I have some incorrect setting in calibre? TIA!! Last edited by puterdude; 01-23-2012 at 10:07 PM. Reason: Emoticons do not display correctly |
Advert | |
|
01-24-2012, 07:11 AM | #3 |
Zealot
Posts: 115
Karma: 10
Join Date: Jan 2011
Device: none
|
I was annoyed by having to correct manually every time there was a hyphen in the author's name or in the title, so i continued tinkering.
I've figured it a bit more: code (?P<author>.*?)( -\s*(?P<series>[^_0-9]*)(?P<series_index>[0-9]*))? -\s*(?P<title>[^_].+) ? It takes care of cases with or without hyphens in title, series & authors, with or without series. The remaining problem is that if there is no author, series goes into the author slot. But that is a rare occurrence, so i consider the problem mainly solved. |
01-24-2012, 07:27 AM | #4 |
Zealot
Posts: 115
Karma: 10
Join Date: Jan 2011
Device: none
|
Problem with Author - Series # - Title Regex
Hey, puterdude, what are you trying to achieve?
If it is simple "authors" - "series" "index" - "title", you can try the original regex included in calibre ( that was my point of departure) or use mine, slightly improved one (i hope). |
Tags |
add books, regex |
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
RegEx Help needed | ghostyjack | Sigil | 14 | 11-02-2011 10:22 AM |
Creating Proper TOC in Kindle - regex help needed | lyric | Conversion | 1 | 10-17-2011 06:19 PM |
Chapter detection when only digits - regex needed | Perkin | Calibre | 15 | 09-20-2010 06:25 PM |
RegEx REPLACEMENT: Help needed! | LARdT | Sigil | 12 | 01-04-2010 07:25 PM |
Regex help needed | gandor62 | Calibre | 2 | 11-04-2009 10:27 AM |