07-09-2011, 12:22 AM | #1 |
Enthusiast
Posts: 27
Karma: 30
Join Date: Jul 2011
Device: none
|
Seriously, how to parse metadata from filenames
Hi!
I know what a regular expression is, and GENERALLY how to use them. I don't know Python, but I read the link. What I can't figure out, is how to parse a filename into Calibre metadata. I read the tutorial, it was not too helpful. I clicked the checkbox that made me hope that Calibe would use the filename. I am trying to parse filenames like: Code:
tb-2099 California microbial life (john adams) 1999 Code:
(.*\d\s)(.*)\s\((j.*)\)\s(\d*).* Code:
(.*\d\s)(.*)\s\((?P<author>.*)\)\s(\d*).* How do I really extract the metadata from a filename? Thanks so much! |
07-09-2011, 03:24 AM | #2 |
Grand Sorcerer
Posts: 11,741
Karma: 6997045
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
|
Does your test file name have an extension? Won't work without it.
|
Advert | |
|
07-09-2011, 10:47 PM | #3 |
Enthusiast
Posts: 27
Karma: 30
Join Date: Jul 2011
Device: none
|
Yes, I am working with plain text files with a ".txt" extension. So it would be
tb-2099 California microbial life (john adams) 1999.txt Is there documentation somewhere for the symbolic names that can be used for expressions? For example is it "(?P<author>.*)" or (?P<authors>.*)? Does case matter? Does import fail if their is whitespace? |
07-09-2011, 11:16 PM | #4 |
Enthusiast
Posts: 27
Karma: 30
Join Date: Jul 2011
Device: none
|
Cool, I just discovered the mouse-over feature.
|
07-09-2011, 11:37 PM | #5 |
US Navy, Retired
Posts: 9,864
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Nexus 7
|
He was just pointing out that if you don't include the file extension in the test window then you won't get any results when you press the test button.
|
Advert | |
|
07-12-2011, 02:15 AM | #6 |
Wizard
Posts: 3,130
Karma: 91256
Join Date: Feb 2008
Location: Germany
Device: Cybook Gen3
|
If you're talking about one of the tutorials/guides in the stickies here, suggestions for improvements would be much appreciated.
|
07-17-2011, 09:32 PM | #7 |
Enthusiast
Posts: 27
Karma: 30
Join Date: Jul 2011
Device: none
|
I settled on a solution.
As Calibre is great software, I will try to respond with some good, usable suggestions. In the meantime, here is what I eventually settled with. Code:
((?P<series>\w+)?\W(?P<series_index>\d+).+?)?(?P<title>.*)\s+\((?P<author>.*)\)\s?(?P<published>\d+)?.* My first suggestion is that the test functionality do as-you-type validation and matching of the expression, so that the user knows when Calibre is not going to find any data given the expression and sample. For the tutorial,it should explicitly state that the Calibre regular expressions are a extension of other regular expression ... um, grammars. And detail how symbolic grouping works, and how general parenthetical grouping works. A table of recipes for pulling data out of some sample strings would be great. Maybe I can help with the first of those. |
07-18-2011, 01:58 PM | #8 | |
Wizard
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
Quote:
Calibre must have a title for every book. If the regex you wrote for the title field doesn't match something (or you've omitted it), Calibre gives up on your regex and reverts to using the entire filename as the title and Unknown as the author. |
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Filenames to metadata, preserving filenames. | nitrogun | Calibre | 5 | 09-13-2010 10:50 PM |
Initial parse failed: | mburgoa | Calibre | 4 | 08-07-2010 08:50 AM |
PDF Filenames vs Metadata Title | clintbradford | Calibre | 0 | 07-12-2010 11:50 PM |
batch metadata editing possible from filenames? | caponesan | Reading and Management | 3 | 09-03-2009 12:50 PM |
libprs500 metadata from filenames | Dan23 | Calibre | 2 | 06-29-2008 06:04 PM |