![]() |
#1 |
Fanatic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 547
Karma: 27509
Join Date: Dec 2007
Location: Greater Vancouver Area, BC, Canada
Device: Nexus 7, Sony Xperia z3 tablet, Kobo Glo, Boyue T63
|
Help with "Guessing metadata from file names"
I have over 800 eBooks which I really really want to import into Calibre. However I'm scared to, I previously attempt to load my library into Calibre using one of the 0.5 released and was dismayed at how many, over half, of my collection had no author or title information at all. These eBooks all have authors and titles on my Cybook and BeBook, so I'm confused. The vast majority of my eBooks are Mobipocket .prc format, I've been told this could be part of the problem.
I would love to use the 'Guess metadata from file name' option but I have 2 distinct naming conventions, one with series information and one without. Since the Gui only has 1 option, I'm not sure how to handle this.
Should I start adding an extra "_" to all the titles without series, or is there an easy way to program this using a script that runs on command line? I'm not afraid of downloading new programs and having a go at it if someone can get me started. I would also love to be able to retain the Genre information as a tag during import if that is at all possible. |
![]() |
![]() |
![]() |
#2 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,169
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
You can actually write a regex to handle both those cases. I don't have the time to do it for you, but hopefully someone who does will come along. There's also a thread on this forum about developing file name import regexes that you may find useful.
|
![]() |
![]() |
Advert | |
|
![]() |
#3 |
Sigil & calibre developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
The following regex will match both cases provided they're no _ within each section.
Code:
[^_]+_(?P<author>[^_]+)_((?P<series>[^_]+)_)?(?P<title>[^_]+) |
![]() |
![]() |
![]() |
#4 |
Fanatic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 547
Karma: 27509
Join Date: Dec 2007
Location: Greater Vancouver Area, BC, Canada
Device: Nexus 7, Sony Xperia z3 tablet, Kobo Glo, Boyue T63
|
|
![]() |
![]() |
![]() |
#5 | |
Fanatic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 547
Karma: 27509
Join Date: Dec 2007
Location: Greater Vancouver Area, BC, Canada
Device: Nexus 7, Sony Xperia z3 tablet, Kobo Glo, Boyue T63
|
Quote:
Any ideas? |
|
![]() |
![]() |
Advert | |
|
![]() |
#6 |
Sigil & calibre developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
This will work provided there are no numbers in the series name:
Code:
[^_]+_(?P<author>[^_]+)_((?P<series>[^_\d]+)(?P<series_index>\d+)?_)?(?P<title>[^_]+) |
![]() |
![]() |
![]() |
#7 | |
Fanatic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 547
Karma: 27509
Join Date: Dec 2007
Location: Greater Vancouver Area, BC, Canada
Device: Nexus 7, Sony Xperia z3 tablet, Kobo Glo, Boyue T63
|
Quote:
I don't see a test field for Genre or Tag in the regex creation screen. Does this mean that I can't use a <tag> or <tags> at the begining of the above expression to retain my Genre tags, or would them come in anyway if I had the correct field identifier? |
|
![]() |
![]() |
![]() |
#8 |
Sigil & calibre developer
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 2,487
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
|
Tag is not supported. The only identifiers that can be pulled out of the file name are what are on that screen (Title, Authors, Series, Series Index, and ISBN).
|
![]() |
![]() |
![]() |
#9 |
Fanatic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 547
Karma: 27509
Join Date: Dec 2007
Location: Greater Vancouver Area, BC, Canada
Device: Nexus 7, Sony Xperia z3 tablet, Kobo Glo, Boyue T63
|
|
![]() |
![]() |
![]() |
#10 |
Fanatic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 547
Karma: 27509
Join Date: Dec 2007
Location: Greater Vancouver Area, BC, Canada
Device: Nexus 7, Sony Xperia z3 tablet, Kobo Glo, Boyue T63
|
This regex worked great last year but now I've had to completely reload my netbook. I am now running Calibrie 0.7.12. When I plugged the above regex into the 'Add files' screen and tested it on one of my file names it put the entire file name into the Title field instead of splitting out the components. Has something changed in how the filenames are handled?
|
![]() |
![]() |
![]() |
#11 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,169
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Make sure you put the file extension as well as the name.
|
![]() |
![]() |
![]() |
#12 | |
Fanatic
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 547
Karma: 27509
Join Date: Dec 2007
Location: Greater Vancouver Area, BC, Canada
Device: Nexus 7, Sony Xperia z3 tablet, Kobo Glo, Boyue T63
|
I entered this as the Regular Expression:
Quote:
It's been a while since I played with Calibre but since I'm not working right now I thought I'd finally do that first mass import of all my eBooks. Unfortunately for me, the coding I learned in school for C+ didn't look anything like this. ![]() |
|
![]() |
![]() |
![]() |
#13 |
creator of calibre
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 45,169
Karma: 27110894
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Have you checked the option to read metadata only from filenames.
|
![]() |
![]() |
![]() |
#14 |
Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() ![]() Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
|
|
![]() |
![]() |
![]() |
#15 |
Member
![]() Posts: 23
Karma: 10
Join Date: Aug 2010
Device: none
|
Hi,
I've a very similar situation here, same as above, but without genre. The filenames are 'Author FirstName LastName_SeriesnameSeriesindex_Title'. I slightly modified your regex to Code:
(?P<author>[A-Za-z ]+) ((?P<series>[^_\d]+)(?P<series_index>\d+)? )?(?P<title>[A-Za-z ][^_]+) 1. The seriesindex isn't correct, e.g. 0051 is shown as 51.0 2. Spaces in seriesname aren't handled correctly, e.g. seriesname 'A B' is shown as 'B', and author as 'author A'. Can you help correct this? Best regards mumdigau |
![]() |
![]() |
![]() |
|
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
File names with "(" and ")" can cause screen freezes | greenapple | Ectaco jetBook | 5 | 02-04-2010 08:25 PM |
Get "Tag" metadata from file name | dosyoyas | Calibre | 2 | 01-13-2010 01:09 PM |
Fiction Writers as "Brand Names" | kilohertz53 | Lounge | 53 | 11-02-2007 05:00 PM |
Help! the "Make Sony Reader File" under "Options" is different | Dr. Drib | Sony Reader | 6 | 04-23-2007 02:56 AM |
New ".mobi" domain names are coming in May 2006 | Bob Russell | Lounge | 3 | 04-25-2006 05:38 PM |