|  07-06-2009, 06:14 PM | #1 | 
| Fanatic            Posts: 547 Karma: 27509 Join Date: Dec 2007 Location: Greater Vancouver Area, BC, Canada Device: Nexus 7, Sony Xperia z3 tablet, Kobo Glo, Boyue T63 | 
				
				Help with "Guessing metadata from file names"
			 
			
			I have over 800 eBooks which I really really want to import into Calibre. However I'm scared to, I previously attempt to load my library into Calibre using one of the 0.5 released and was dismayed at how many, over half, of my collection had no author or title information at all. These eBooks all have authors and titles on my Cybook and BeBook, so I'm confused. The vast majority of my eBooks are Mobipocket .prc format, I've been told this could be part of the problem. I would love to use the 'Guess metadata from file name' option but I have 2 distinct naming conventions, one with series information and one without. Since the Gui only has 1 option, I'm not sure how to handle this. 
 Should I start adding an extra "_" to all the titles without series, or is there an easy way to program this using a script that runs on command line? I'm not afraid of downloading new programs and having a go at it if someone can get me started. I would also love to be able to retain the Genre information as a tag during import if that is at all possible. | 
|   |   | 
|  07-06-2009, 07:24 PM | #2 | 
| creator of calibre            Posts: 45,604 Karma: 28548974 Join Date: Oct 2006 Location: Mumbai, India Device: Various | 
			
			You can actually write a regex to handle both those cases. I don't have the time to do it for you, but hopefully someone who does will come along. There's also a thread on this forum about developing file name import regexes that you may find useful.
		 | 
|   |   | 
|  07-06-2009, 08:32 PM | #3 | 
| Sigil & calibre developer            Posts: 2,487 Karma: 1063785 Join Date: Jan 2009 Location: Florida, USA Device: Nook STR | 
			
			The following regex will match both cases provided they're no _ within each section. Code: [^_]+_(?P<author>[^_]+)_((?P<series>[^_]+)_)?(?P<title>[^_]+) | 
|   |   | 
|  07-07-2009, 10:13 PM | #4 | 
| Fanatic            Posts: 547 Karma: 27509 Join Date: Dec 2007 Location: Greater Vancouver Area, BC, Canada Device: Nexus 7, Sony Xperia z3 tablet, Kobo Glo, Boyue T63 | |
|   |   | 
|  07-27-2009, 01:52 AM | #5 | |
| Fanatic            Posts: 547 Karma: 27509 Join Date: Dec 2007 Location: Greater Vancouver Area, BC, Canada Device: Nexus 7, Sony Xperia z3 tablet, Kobo Glo, Boyue T63 | Quote: 
 
 Any ideas? | |
|   |   | 
|  07-27-2009, 06:45 AM | #6 | 
| Sigil & calibre developer            Posts: 2,487 Karma: 1063785 Join Date: Jan 2009 Location: Florida, USA Device: Nook STR | 
			
			This will work provided there are no numbers in the series name: Code: [^_]+_(?P<author>[^_]+)_((?P<series>[^_\d]+)(?P<series_index>\d+)?_)?(?P<title>[^_]+) | 
|   |   | 
|  07-29-2009, 04:18 PM | #7 | |
| Fanatic            Posts: 547 Karma: 27509 Join Date: Dec 2007 Location: Greater Vancouver Area, BC, Canada Device: Nexus 7, Sony Xperia z3 tablet, Kobo Glo, Boyue T63 | Quote: 
 I don't see a test field for Genre or Tag in the regex creation screen. Does this mean that I can't use a <tag> or <tags> at the begining of the above expression to retain my Genre tags, or would them come in anyway if I had the correct field identifier? | |
|   |   | 
|  07-30-2009, 07:13 AM | #8 | 
| Sigil & calibre developer            Posts: 2,487 Karma: 1063785 Join Date: Jan 2009 Location: Florida, USA Device: Nook STR | 
			
			Tag is not supported. The only identifiers that can be pulled out of the file name are what are on that screen (Title, Authors, Series, Series Index, and ISBN).
		 | 
|   |   | 
|  07-30-2009, 04:26 PM | #9 | 
| Fanatic            Posts: 547 Karma: 27509 Join Date: Dec 2007 Location: Greater Vancouver Area, BC, Canada Device: Nexus 7, Sony Xperia z3 tablet, Kobo Glo, Boyue T63 | |
|   |   | 
|  08-03-2010, 01:46 AM | #10 | 
| Fanatic            Posts: 547 Karma: 27509 Join Date: Dec 2007 Location: Greater Vancouver Area, BC, Canada Device: Nexus 7, Sony Xperia z3 tablet, Kobo Glo, Boyue T63 | 
			
			This regex worked great last year but now I've had to completely reload my netbook. I am now running Calibrie 0.7.12. When I plugged the above regex into the 'Add files' screen and tested it on one of my file names it put the entire file name into the Title field instead of splitting out the components. Has something changed in how the filenames are handled?
		 | 
|   |   | 
|  08-03-2010, 11:30 AM | #11 | 
| creator of calibre            Posts: 45,604 Karma: 28548974 Join Date: Oct 2006 Location: Mumbai, India Device: Various | 
			
			Make sure you put the file extension as well as the name.
		 | 
|   |   | 
|  08-03-2010, 04:05 PM | #12 | |
| Fanatic            Posts: 547 Karma: 27509 Join Date: Dec 2007 Location: Greater Vancouver Area, BC, Canada Device: Nexus 7, Sony Xperia z3 tablet, Kobo Glo, Boyue T63 | 
			
			I entered this as the Regular Expression: Quote: 
 
 
 It's been a while since I played with Calibre but since I'm not working right now I thought I'd finally do that first mass import of all my eBooks. Unfortunately for me, the coding I learned in school for C+ didn't look anything like this.   | |
|   |   | 
|  08-03-2010, 04:07 PM | #13 | 
| creator of calibre            Posts: 45,604 Karma: 28548974 Join Date: Oct 2006 Location: Mumbai, India Device: Various | 
			
			Have you checked the option to read metadata only from filenames.
		 | 
|   |   | 
|  08-03-2010, 04:31 PM | #14 | 
| Wizard            Posts: 4,004 Karma: 177841 Join Date: Dec 2009 Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T | |
|   |   | 
|  08-03-2010, 06:55 PM | #15 | 
| Member  Posts: 23 Karma: 10 Join Date: Aug 2010 Device: none | 
			
			Hi, I've a very similar situation here, same as above, but without genre. The filenames are 'Author FirstName LastName_SeriesnameSeriesindex_Title'. I slightly modified your regex to Code: (?P<author>[A-Za-z ]+) ((?P<series>[^_\d]+)(?P<series_index>\d+)? )?(?P<title>[A-Za-z ][^_]+) 1. The seriesindex isn't correct, e.g. 0051 is shown as 51.0 2. Spaces in seriesname aren't handled correctly, e.g. seriesname 'A B' is shown as 'B', and author as 'author A'. Can you help correct this? Best regards mumdigau | 
|   |   | 
|  | 
| 
 | 
|  Similar Threads | ||||
| Thread | Thread Starter | Forum | Replies | Last Post | 
| File names with "(" and ")" can cause screen freezes | greenapple | Ectaco jetBook | 5 | 02-04-2010 08:25 PM | 
| Get "Tag" metadata from file name | dosyoyas | Calibre | 2 | 01-13-2010 01:09 PM | 
| Fiction Writers as "Brand Names" | kilohertz53 | Lounge | 53 | 11-02-2007 05:00 PM | 
| Help! the "Make Sony Reader File" under "Options" is different | Dr. Drib | Sony Reader | 6 | 04-23-2007 02:56 AM | 
| New ".mobi" domain names are coming in May 2006 | Bob Russell | Lounge | 3 | 04-25-2006 05:38 PM |