![]() |
#1 |
Groupie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 196
Karma: 126824
Join Date: Dec 2008
Location: Out There
Device: K3 W/3G (Fixed screen!) & Paperwhite Wifi
|
Regex help on reading Metadata from file name.
90% of my files are in the format: (The other 10% do not include the pub date)
Format: Series-series number title (author) Pub date.txt (Series 2-4 letters) (Series number 2-4 digits) (pub date-year only) ex. BA-123 How it works (John Smith) 1989.txt ROT-4089 Make it this way (Jane Smith) 2009.txt Playing around with the regex I was able to separate the series and number but I could not work out the title and author Typicaly I ended up with Title: How it works (John Smith Author: ) And pubdate does not work at all. Unfortunetly I kept changing it around and now it does not work at all and I cant remember what I had that almost worked. ![]() One thing I have had trouble with is the "(" and ")" and trying to search for them in the title. I CAN search and replace the titles to remove them to substitute them for another character to make it easier to run a regex if necessary (just not "-" as some titles have a "-" in them. Anyone have any clue how to do this? edit: This is as close as I can come to what I had Code:
(?P<series>[^_0-9-]*)-(?P<series_index>[0-9]*)(?P<title>[^_-]+) \(?(?P<author>[^_].+) -?(?P<date>[^_].+) ? |
![]() |
![]() |
![]() |
#2 |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Code:
(?P<series>[A-Za-z ]+)-(?P<series_index>[0-9]+) (?P<title>.+) \((?P<author>[A-Za-z. ]+)\) (?P<published>[0-9]+) Last edited by eschwartz; 03-01-2015 at 09:47 PM. |
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Groupie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 196
Karma: 126824
Join Date: Dec 2008
Location: Out There
Device: K3 W/3G (Fixed screen!) & Paperwhite Wifi
|
Great that seems to have done it. (At least in the test box) I will run some books through and see if importing works... But I am sure it will.
Thanks, |
![]() |
![]() |
![]() |
#4 |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
You're welcome.
![]() |
![]() |
![]() |
![]() |
#5 |
Groupie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 196
Karma: 126824
Join Date: Dec 2008
Location: Out There
Device: K3 W/3G (Fixed screen!) & Paperwhite Wifi
|
Ok, in the test box it works perfectly. (and as you noted the strange bug, all the "published" years have Mar-15 appended for month and day)
and once files are processed, in Calibre the data is correct except.... The titles are the original file name, not the book tittle from the add books regex. ex, ROT-4089 Make it this way (Jane Smith) 2009.txt in the regex box is: Series: ROT [4089] Author: Jane Smith Title: Make it this way Published: 2009-03-15 But once imported in Calibre Title:ROT-4089 Make it this way (Jane Smith) 2009 Why is it ignoring the regex for the tittle? EDIT: Never mind. I forgot to uncheck the box for read from metadata instead of file name. |
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Glad you figured it out!
![]() Also, regarding your PM re: covers -- no, there is no good solution for getting covers for a TXT file. You can import using Alternatively, do a metadata download (shortcut key is CTRL+D) which redownloads covers from various sources... which also takes time, although you can set it running automatically. |
![]() |
![]() |
![]() |
#7 |
Groupie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 196
Karma: 126824
Join Date: Dec 2008
Location: Out There
Device: K3 W/3G (Fixed screen!) & Paperwhite Wifi
|
Yes, that is how I import them, I may have to use the method posted in that thread I linked in my PM to generate covers and opfs, then do a cover substitution and re-import.
Thanks for your assistance. |
![]() |
![]() |
![]() |
#8 | |
Groupie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 196
Karma: 126824
Join Date: Dec 2008
Location: Out There
Device: K3 W/3G (Fixed screen!) & Paperwhite Wifi
|
Quote:
Is there a way to make the published date part optional? Last edited by JohnnyBook; 03-04-2015 at 08:35 PM. |
|
![]() |
![]() |
![]() |
#9 |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Yes, append that group with a question mark.
Code:
(?P<series>[A-Za-z ]+)-(?P<series_index>[0-9]+) (?P<title>.+) \((?P<author>[A-Za-z. ]+)\) (?P<published>[0-9]+)? ![]() ![]() If I didn't have short-term memory loss and forget part of the OP... ![]() Last edited by eschwartz; 03-05-2015 at 04:44 AM. |
![]() |
![]() |
![]() |
#10 | |
Groupie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 196
Karma: 126824
Join Date: Dec 2008
Location: Out There
Device: K3 W/3G (Fixed screen!) & Paperwhite Wifi
|
Quote:
![]() "90% of my files are in the format: (The other 10% do not include the pub date)" In any case, Thanks bunches for all your help. ![]() EDIT: Nope still did not do it... Maybe since it does not find the space after the Author, it still fails? ROT-4089 Make it this way (Jane Smith) 2009.txt ROT-4089 Make it this way (Jane Smith).txt The first works, the second is not parsed. Last edited by JohnnyBook; 03-05-2015 at 12:01 AM. |
|
![]() |
![]() |
![]() |
#11 |
Ex-Helpdesk Junkie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 19,421
Karma: 85400180
Join Date: Nov 2012
Location: The Beaten Path, USA, Roundworld, This Side of Infinity
Device: Kindle Touch fw5.3.7 (Wifi only)
|
Heh, overlooked that, sorry.
![]() Dur -- because it was still trying to find a space at the end. ![]() Code:
(?P<series>[A-Za-z ]+)-(?P<series_index>[0-9]+) (?P<title>.+) \((?P<author>[A-Za-z. ]+)\) ?(?P<published>[0-9]+)? |
![]() |
![]() |
![]() |
#12 |
Groupie
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 196
Karma: 126824
Join Date: Dec 2008
Location: Out There
Device: K3 W/3G (Fixed screen!) & Paperwhite Wifi
|
That did it. it works great now.
And it looks like that other thread, to generate covers and opfs, then do a cover substitution and re-import is actually pretty fast and easy to do. https://www.mobileread.com/forums/sho...d+cover&page=2 |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Missing cover when reading metadata from file name | emphyrion | Library Management | 4 | 01-31-2014 09:49 AM |
Reading some fields from filename and others from file metadata | Daniel_321 | Calibre | 1 | 11-25-2012 07:14 AM |
Metadata/regex help | lathom | Library Management | 3 | 11-10-2011 01:52 PM |
RegEx - filename metadata help | ejjenkins | Calibre | 4 | 12-28-2010 05:47 PM |
Recognition of author and title from html files/reading metadata from a seperate file | Lethe | Calibre | 5 | 04-03-2010 08:35 AM |