Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre > Library Management

Notices

Reply
 
Thread Tools Search this Thread
Old 07-06-2009, 06:14 PM   #1
Justy
Fanatic
Justy has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud of
 
Justy's Avatar
 
Posts: 547
Karma: 27509
Join Date: Dec 2007
Location: Greater Vancouver Area, BC, Canada
Device: Nexus 7, Sony Xperia z3 tablet, Kobo Glo, Boyue T63
Help with "Guessing metadata from file names"

I have over 800 eBooks which I really really want to import into Calibre. However I'm scared to, I previously attempt to load my library into Calibre using one of the 0.5 released and was dismayed at how many, over half, of my collection had no author or title information at all. These eBooks all have authors and titles on my Cybook and BeBook, so I'm confused. The vast majority of my eBooks are Mobipocket .prc format, I've been told this could be part of the problem.

I would love to use the 'Guess metadata from file name' option but I have 2 distinct naming conventions, one with series information and one without. Since the Gui only has 1 option, I'm not sure how to handle this.
  • with series: Genre_AuthorLastName_Series_Title
  • w/out series: Genre_AuthorLastName_Title

Should I start adding an extra "_" to all the titles without series, or is there an easy way to program this using a script that runs on command line? I'm not afraid of downloading new programs and having a go at it if someone can get me started. I would also love to be able to retain the Genre information as a tag during import if that is at all possible.
Justy is offline   Reply With Quote
Old 07-06-2009, 07:24 PM   #2
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,778
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
You can actually write a regex to handle both those cases. I don't have the time to do it for you, but hopefully someone who does will come along. There's also a thread on this forum about developing file name import regexes that you may find useful.
kovidgoyal is offline   Reply With Quote
Advert
Old 07-06-2009, 08:32 PM   #3
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
The following regex will match both cases provided they're no _ within each section.

Code:
[^_]+_(?P<author>[^_]+)_((?P<series>[^_]+)_)?(?P<title>[^_]+)
At least it matches your Genre_AuthorLastName_Series_Title and Genre_AuthorLastName_Title examples.
user_none is offline   Reply With Quote
Old 07-07-2009, 10:13 PM   #4
Justy
Fanatic
Justy has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud of
 
Justy's Avatar
 
Posts: 547
Karma: 27509
Join Date: Dec 2007
Location: Greater Vancouver Area, BC, Canada
Device: Nexus 7, Sony Xperia z3 tablet, Kobo Glo, Boyue T63
Quote:
Originally Posted by user_none View Post
The following regex will match both cases provided they're no _ within each section.
Thank you! I'll give it a try tonite.
Justy is offline   Reply With Quote
Old 07-27-2009, 01:52 AM   #5
Justy
Fanatic
Justy has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud of
 
Justy's Avatar
 
Posts: 547
Karma: 27509
Join Date: Dec 2007
Location: Greater Vancouver Area, BC, Canada
Device: Nexus 7, Sony Xperia z3 tablet, Kobo Glo, Boyue T63
Quote:
Originally Posted by user_none View Post
The following regex will match both cases provided they're no _ within each section.

Code:
[^_]+_(?P<author>[^_]+)_((?P<series>[^_]+)_)?(?P<title>[^_]+)
Thanks for the code, I finally got a chance to test it out today. It works great except for the fact that I forgot to mention that I put in the series index after the series name.
  • with series:
    [**]Genre_AuthorLastName_Series1_Title
    [**]Genre_AuthorLastName_Series2_Title
    [**]where 1 and 2 are the positions in the series.
  • w/out series: Genre_AuthorLastName_Title

Any ideas?
Justy is offline   Reply With Quote
Advert
Old 07-27-2009, 06:45 AM   #6
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
This will work provided there are no numbers in the series name:

Code:
[^_]+_(?P<author>[^_]+)_((?P<series>[^_\d]+)(?P<series_index>\d+)?_)?(?P<title>[^_]+)
user_none is offline   Reply With Quote
Old 07-29-2009, 04:18 PM   #7
Justy
Fanatic
Justy has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud of
 
Justy's Avatar
 
Posts: 547
Karma: 27509
Join Date: Dec 2007
Location: Greater Vancouver Area, BC, Canada
Device: Nexus 7, Sony Xperia z3 tablet, Kobo Glo, Boyue T63
Quote:
Originally Posted by user_none View Post
This will work provided there are no numbers in the series name:

Code:
[^_]+_(?P<author>[^_]+)_((?P<series>[^_\d]+)(?P<series_index>\d+)?_)?(?P<title>[^_]+)
You are amazing, thanks! That worked wonderfully!

I don't see a test field for Genre or Tag in the regex creation screen. Does this mean that I can't use a <tag> or <tags> at the begining of the above expression to retain my Genre tags, or would them come in anyway if I had the correct field identifier?
Justy is offline   Reply With Quote
Old 07-30-2009, 07:13 AM   #8
user_none
Sigil & calibre developer
user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.user_none ought to be getting tired of karma fortunes by now.
 
user_none's Avatar
 
Posts: 2,488
Karma: 1063785
Join Date: Jan 2009
Location: Florida, USA
Device: Nook STR
Tag is not supported. The only identifiers that can be pulled out of the file name are what are on that screen (Title, Authors, Series, Series Index, and ISBN).
user_none is offline   Reply With Quote
Old 07-30-2009, 04:26 PM   #9
Justy
Fanatic
Justy has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud of
 
Justy's Avatar
 
Posts: 547
Karma: 27509
Join Date: Dec 2007
Location: Greater Vancouver Area, BC, Canada
Device: Nexus 7, Sony Xperia z3 tablet, Kobo Glo, Boyue T63
Quote:
Originally Posted by user_none View Post
Tag is not supported. The only identifiers that can be pulled out of the file name are what are on that screen (Title, Authors, Series, Series Index, and ISBN).
Thanks again! You have been very very helpful!
Justy is offline   Reply With Quote
Old 08-03-2010, 01:46 AM   #10
Justy
Fanatic
Justy has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud of
 
Justy's Avatar
 
Posts: 547
Karma: 27509
Join Date: Dec 2007
Location: Greater Vancouver Area, BC, Canada
Device: Nexus 7, Sony Xperia z3 tablet, Kobo Glo, Boyue T63
Quote:
Originally Posted by user_none View Post
This will work provided there are no numbers in the series name:

Code:
[^_]+_(?P<author>[^_]+)_((?P<series>[^_\d]+)(?P<series_index>\d+)?_)?(?P<title>[^_]+)
This regex worked great last year but now I've had to completely reload my netbook. I am now running Calibrie 0.7.12. When I plugged the above regex into the 'Add files' screen and tested it on one of my file names it put the entire file name into the Title field instead of splitting out the components. Has something changed in how the filenames are handled?
Justy is offline   Reply With Quote
Old 08-03-2010, 11:30 AM   #11
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,778
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Make sure you put the file extension as well as the name.
kovidgoyal is offline   Reply With Quote
Old 08-03-2010, 04:05 PM   #12
Justy
Fanatic
Justy has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud ofJusty has much to be proud of
 
Justy's Avatar
 
Posts: 547
Karma: 27509
Join Date: Dec 2007
Location: Greater Vancouver Area, BC, Canada
Device: Nexus 7, Sony Xperia z3 tablet, Kobo Glo, Boyue T63
I entered this as the Regular Expression:
Quote:
Originally Posted by user_none View Post
Code:
[^_]+_(?P<author>[^_]+)_((?P<series>[^_\d]+)(?P<series_index>\d+)?_)?(?P<title>[^_]+)
My test file names were:
  1. Genre_AuthorLastName_Title.prc
  2. Genre_AuthorLastName_Series01_Title.prc
which gave the following results respectively:
  1. Title = Genre AuthorLastName Title
  2. Title = Genre AuthorLastName Series01 Title

It's been a while since I played with Calibre but since I'm not working right now I thought I'd finally do that first mass import of all my eBooks. Unfortunately for me, the coding I learned in school for C+ didn't look anything like this.
Justy is offline   Reply With Quote
Old 08-03-2010, 04:07 PM   #13
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 43,778
Karma: 22666666
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Have you checked the option to read metadata only from filenames.
kovidgoyal is offline   Reply With Quote
Old 08-03-2010, 04:31 PM   #14
Starson17
Wizard
Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.Starson17 can program the VCR without an owner's manual.
 
Posts: 4,004
Karma: 177841
Join Date: Dec 2009
Device: WinMo: IPAQ; Android: HTC HD2, Archos 7o; Java:Gravity T
Quote:
Originally Posted by Justy View Post
My test file names were:
  1. Genre_AuthorLastName_Title.prc
  2. Genre_AuthorLastName_Series01_Title.prc
Try this:
Code:
([^_]+ )(?P<author>[A-Za-z]+) ((?P<series>[^_\d]+)(?P<series_index>\d+)? )?(?P<title>[^_]+)
Starson17 is offline   Reply With Quote
Old 08-03-2010, 06:55 PM   #15
mumdigau
Member
mumdigau began at the beginning.
 
Posts: 23
Karma: 10
Join Date: Aug 2010
Device: none
Hi,

I've a very similar situation here, same as above, but without genre. The filenames are 'Author FirstName LastName_SeriesnameSeriesindex_Title'.

I slightly modified your regex to
Code:
(?P<author>[A-Za-z ]+) ((?P<series>[^_\d]+)(?P<series_index>\d+)? )?(?P<title>[A-Za-z ][^_]+)
which more or less works besides two pitfalls:

1. The seriesindex isn't correct, e.g. 0051 is shown as 51.0
2. Spaces in seriesname aren't handled correctly, e.g. seriesname 'A B' is shown as 'B', and author as 'author A'.

Can you help correct this?

Best regards

mumdigau
mumdigau is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
File names with "(" and ")" can cause screen freezes greenapple Ectaco jetBook 5 02-04-2010 08:25 PM
Get "Tag" metadata from file name dosyoyas Calibre 2 01-13-2010 01:09 PM
Fiction Writers as "Brand Names" kilohertz53 Lounge 53 11-02-2007 05:00 PM
Help! the "Make Sony Reader File" under "Options" is different Dr. Drib Sony Reader 6 04-23-2007 02:56 AM
New ".mobi" domain names are coming in May 2006 Bob Russell Lounge 3 04-25-2006 05:38 PM


All times are GMT -4. The time now is 06:19 AM.


MobileRead.com is a privately owned, operated and funded community.