03-31-2008, 01:08 PM | #1 |
Connoisseur
Posts: 76
Karma: 22
Join Date: Mar 2008
Location: uk
Device: Sony PRS505
|
libprs500 - title/author matching regex
I've just started playing with libprs500 (0.4.46) in preperation for a Sony PRS505 I have on the way and I'm having a spot of bother trying to get the standard regex to correctly identify the author and title from the filename.
The standard syntax I believe is: (?P<author>.+) - (?P<title>[^_]+) Which, if in the test box, I paste in the following string "H.P Lovecraft - At the Mountains of Madness.txt" correctly reports the following: Title: "At the Mountains of Madness" Author: "H.P. Lovecraft" Series: "No Match" Series Index: "No Match" However, actually importing that same file into the library displays the following: Title: "H.P. Lovecraft - At the Mountains of Madness" Author: "H.P. Lovecraft" (all other columns are blank as expected) Is this standard behaviour or a bug? |
03-31-2008, 02:07 PM | #2 |
Connoisseur
Posts: 76
Karma: 22
Join Date: Mar 2008
Location: uk
Device: Sony PRS505
|
Upon further investigation it only seems to do this with PDF documents; the author and title fields seem to map correctly against html, zip and text based files.
So if I rename a pdf, an html file, a text file and a zip all to the same name: wibble - wobble.[pdf|zip|txt|html] ...then the html, text and zip version of the file will all correctly display as title="wobble", author="wibble". However the pdf file will show as title="wibble - wobble" and author="wibble". |
Advert | |
|
03-31-2008, 02:29 PM | #3 |
creator of calibre
Posts: 44,327
Karma: 23661992
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
libprs500 tries to read metadata from the file itself first. Only if that fails does it use the filename.
|
03-31-2008, 02:38 PM | #4 |
Connoisseur
Posts: 76
Karma: 22
Join Date: Mar 2008
Location: uk
Device: Sony PRS505
|
Is this right though? I've attached an example of the difference in behaviour with the same filename for three different file types. There is no metadata set in the PDF file.
|
03-31-2008, 03:17 PM | #5 |
Connoisseur
Posts: 76
Karma: 22
Join Date: Mar 2008
Location: uk
Device: Sony PRS505
|
Ok, digging a bit and would I be correct in thinking that pdf-meta.exe is used to determine the author and title of PDF documents?
Running pdf-meta on my renamed document I get the following: pdf-meta.exe author\ -\ title.pdf Title : author - title Author : Unknown Publisher: None Category : None Comments : None ISBN : None It looks like libprs500 is taking the Title as shown by pdf-meta and not running the regex to split it based on the filename. I have a whole load of PDF docs that have varying states of correct/incorrect meta data and I'd rather load them into libprs500 using the filenames to determine author and title. Other than using pdftk and writing a script to recurse through all of my files to insert metadata based on the filename, can we force libprs500 to use the filename instead, even for PDF's? |
Advert | |
|
03-31-2008, 03:43 PM | #6 |
creator of calibre
Posts: 44,327
Karma: 23661992
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Open a ticket for a config option to customize this behavior.
|
03-31-2008, 04:27 PM | #7 | |
Connoisseur
Posts: 76
Karma: 22
Join Date: Mar 2008
Location: uk
Device: Sony PRS505
|
I've recursed through all of my PDF documents and ran the following script:
Quote:
AUTHOR - SERIES - TITLE.pdf or AUTHOR - TITLE.pdf However... libprs500 is still displaying the PDF files that I have correctly set the metadata on in the form of "author - title". Almost as if it is ignoring both the metadata *and* the filename regex pattern matching altogether and simply using the filename, minus the pdf extension. |
|
03-31-2008, 04:28 PM | #8 |
creator of calibre
Posts: 44,327
Karma: 23661992
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
What does pdf-meta give you on the corrected PDF files?
|
03-31-2008, 04:38 PM | #9 | |
Connoisseur
Posts: 76
Karma: 22
Join Date: Mar 2008
Location: uk
Device: Sony PRS505
|
pdf-meta now shows the correct author, but the title is still the filename minus the extension. e.g.
Quote:
|
|
03-31-2008, 05:18 PM | #10 |
creator of calibre
Posts: 44,327
Karma: 23661992
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Attach one of these PDF files here
|
04-01-2008, 04:16 AM | #11 |
Connoisseur
Posts: 76
Karma: 22
Join Date: Mar 2008
Location: uk
Device: Sony PRS505
|
Ok, will do that when I get back in from work.
|
04-01-2008, 01:25 PM | #12 | ||
Connoisseur
Posts: 76
Karma: 22
Join Date: Mar 2008
Location: uk
Device: Sony PRS505
|
Ok, this is a version of Douglas Adams HHGTTG. Not a great version, but that's not relevant.
Original version Quote:
Quote:
|
||
04-01-2008, 01:29 PM | #13 |
Resident Curmudgeon
Posts: 75,840
Karma: 134368292
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Kobo Libra 2, Kobo Aura H2O, PRS-650, PRS-T1, nook STR, PW3
|
I've had to remove the attachments as they are of a copywritten book. Please use the Libprs500 website's ticket system to attach them there.
|
04-01-2008, 02:20 PM | #14 | |||
Connoisseur
Posts: 76
Karma: 22
Join Date: Mar 2008
Location: uk
Device: Sony PRS505
|
My apologies. Here's one that's now in the public domain. E.E Smith's 'Triplanetary'.
No metadata to start with. Metadata added with the following command: Quote:
Quote:
Quote:
|
|||
04-01-2008, 04:27 PM | #15 |
creator of calibre
Posts: 44,327
Karma: 23661992
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
|
Fixed in svn
|
|
Similar Threads | ||||
Thread | Thread Starter | Forum | Replies | Last Post |
Changing from Title-Author to Author - Title | Amalthia | Calibre | 17 | 01-22-2017 11:20 PM |
looking for a book title and author | Joebill | Reading Recommendations | 16 | 05-23-2010 06:07 AM |
Regex search author field to locate books? | Starson17 | Calibre | 2 | 12-21-2009 10:40 AM |
Author Plus Title Folders | gargoyle67 | Calibre | 2 | 12-15-2009 05:07 PM |
libprs500 - Author Alphabetizing | bingle | Sony Reader | 5 | 10-07-2007 08:05 PM |