Dan23
04-04-2008, 03:36 PM
Two questions:
1. Is there any way to force libprs500 to generate metadata from the filename first (or exclusively) even if the file has metadata within it?
2. This is more of a feature request as I doubt there currently is a way to do this:
Some books in my collection I have saved in the form of <author> - <series> - <series_index> - <title> and those without series are <author> - <title>. Perhaps you could add to the metadata from filename section something that allows for two filename schemes (tries to match the first and then the second)
kovidgoyal
04-04-2008, 04:44 PM
1. No
2. Yeah that's a feature request. Open a ticket.
I actually had the same issue as you. I did the following:
Use a program called PDFInfo to remove the metadata (you may need to use a scripting tool like AutoIT to automate it) then I use the following regular expression:
(?P<authors>[^-]+)\s*-+\s*((?P<series>[^-\d]+)\s*(?P<series_index>\d)*\s*-+\s*)*(?P<title>.+)
Edit: (?P<authors>[^-]+)\s*-+\s*((?P<series>[^-]+)\s*-+\s*)*(?P<title>.+) works without the error of the series_index not being there all the time
It has support for the Authors name, followed by the - separator then an optional Series name & optional Series_Index, followed by an optional separator (tied to the Series option) and finally the Title.
I am trying to work out why sometimes it fails when there is no series_index, but I suspect that once you define a ?P<id> it MUST be used and not be make optional using a *. I may force it picking up a space (\s), unfortunately not all filenames contain a space next to the separator (-) and I much prefer using an optional space between any of the fields (\s*)
If this doesn't end up working, I may end up using Pyton/Perl/VBScript to split the file name up and use PDFInfo to populate the Metadata fields instead, unfortunately there is no MetaData field for Series or Series_Index (at least using PDFInfo) so I may need to find another tool that allows me to add fields and hope Calibre can map these fields to Series & Series_Index.
P.S. I realise you may not be using PDF's, but thats all that applicable for me, but the regular expression may still end up working with a bit of tweaking.