Register Guidelines E-Books Search Today's Posts Mark Forums Read

Go Back   MobileRead Forums > E-Book Software > Calibre

Notices

Reply
 
Thread Tools Search this Thread
Old 03-31-2008, 01:08 PM   #1
Megatron-UK
Connoisseur
Megatron-UK began at the beginning.
 
Posts: 76
Karma: 22
Join Date: Mar 2008
Location: uk
Device: Sony PRS505
libprs500 - title/author matching regex

I've just started playing with libprs500 (0.4.46) in preperation for a Sony PRS505 I have on the way and I'm having a spot of bother trying to get the standard regex to correctly identify the author and title from the filename.

The standard syntax I believe is: (?P<author>.+) - (?P<title>[^_]+)

Which, if in the test box, I paste in the following string "H.P Lovecraft - At the Mountains of Madness.txt" correctly reports the following:

Title: "At the Mountains of Madness"
Author: "H.P. Lovecraft"
Series: "No Match"
Series Index: "No Match"

However, actually importing that same file into the library displays the following:

Title: "H.P. Lovecraft - At the Mountains of Madness"
Author: "H.P. Lovecraft"
(all other columns are blank as expected)

Is this standard behaviour or a bug?
Megatron-UK is offline   Reply With Quote
Old 03-31-2008, 02:07 PM   #2
Megatron-UK
Connoisseur
Megatron-UK began at the beginning.
 
Posts: 76
Karma: 22
Join Date: Mar 2008
Location: uk
Device: Sony PRS505
Upon further investigation it only seems to do this with PDF documents; the author and title fields seem to map correctly against html, zip and text based files.

So if I rename a pdf, an html file, a text file and a zip all to the same name:

wibble - wobble.[pdf|zip|txt|html]

...then the html, text and zip version of the file will all correctly display as title="wobble", author="wibble".

However the pdf file will show as title="wibble - wobble" and author="wibble".
Megatron-UK is offline   Reply With Quote
Old 03-31-2008, 02:29 PM   #3
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,416
Karma: 4961459
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
libprs500 tries to read metadata from the file itself first. Only if that fails does it use the filename.
kovidgoyal is offline   Reply With Quote
Old 03-31-2008, 02:38 PM   #4
Megatron-UK
Connoisseur
Megatron-UK began at the beginning.
 
Posts: 76
Karma: 22
Join Date: Mar 2008
Location: uk
Device: Sony PRS505
Is this right though? I've attached an example of the difference in behaviour with the same filename for three different file types. There is no metadata set in the PDF file.
Attached Thumbnails
Click image for larger version

Name:	Clipboard01.jpg
Views:	233
Size:	55.8 KB
ID:	11895   Click image for larger version

Name:	Clipboard02.jpg
Views:	233
Size:	40.0 KB
ID:	11896  
Megatron-UK is offline   Reply With Quote
Old 03-31-2008, 03:17 PM   #5
Megatron-UK
Connoisseur
Megatron-UK began at the beginning.
 
Posts: 76
Karma: 22
Join Date: Mar 2008
Location: uk
Device: Sony PRS505
Ok, digging a bit and would I be correct in thinking that pdf-meta.exe is used to determine the author and title of PDF documents?

Running pdf-meta on my renamed document I get the following:

pdf-meta.exe author\ -\ title.pdf
Title : author - title
Author : Unknown
Publisher: None
Category : None
Comments : None
ISBN : None

It looks like libprs500 is taking the Title as shown by pdf-meta and not running the regex to split it based on the filename. I have a whole load of PDF docs that have varying states of correct/incorrect meta data and I'd rather load them into libprs500 using the filenames to determine author and title.

Other than using pdftk and writing a script to recurse through all of my files to insert metadata based on the filename, can we force libprs500 to use the filename instead, even for PDF's?
Megatron-UK is offline   Reply With Quote
Old 03-31-2008, 03:43 PM   #6
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,416
Karma: 4961459
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Open a ticket for a config option to customize this behavior.
kovidgoyal is offline   Reply With Quote
Old 03-31-2008, 04:27 PM   #7
Megatron-UK
Connoisseur
Megatron-UK began at the beginning.
 
Posts: 76
Karma: 22
Join Date: Mar 2008
Location: uk
Device: Sony PRS505
I've recursed through all of my PDF documents and ran the following script:

Quote:
#!/bin/bash

find . -name "*.pdf" -print | grep -v .pdf.new | while read PDFPATH
do
DIR=`echo $PDFPATH | awk -F/ '{print $2}'`
FILE=`echo $PDFPATH | awk -F/ '{print $3}'`
AUTHOR=`echo $FILE | awk -F\- '{print $1}' | sed 's/ *$//'`
VAR2=`basename "$FILE" .pdf | awk -F\- '{print $2}' | sed 's/ *$//' | sed 's/^ //'`
VAR3=`basename "$FILE" .pdf | awk -F\- '{print $3}' | sed 's/ *$//' | sed 's/^ //'`
if [ "$VAR3" = "" ]
then
TITLE=$VAR2
SERIES=""
else
TITLE=$VAR3
SERIES=$VAR2
fi

echo "InfoKey: Author
InfoValue: $AUTHOR
InfoKey: Title
InfoValue: $TITLE" > ./metadata

pdftk "$DIR"/"$FILE" update_info metadata output "$DIR"/"$FILE".new

done
This correctly sets the PDF metadata, based on my known-good filename format of:

AUTHOR - SERIES - TITLE.pdf

or

AUTHOR - TITLE.pdf

However... libprs500 is still displaying the PDF files that I have correctly set the metadata on in the form of "author - title". Almost as if it is ignoring both the metadata *and* the filename regex pattern matching altogether and simply using the filename, minus the pdf extension.
Megatron-UK is offline   Reply With Quote
Old 03-31-2008, 04:28 PM   #8
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,416
Karma: 4961459
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
What does pdf-meta give you on the corrected PDF files?
kovidgoyal is offline   Reply With Quote
Old 03-31-2008, 04:38 PM   #9
Megatron-UK
Connoisseur
Megatron-UK began at the beginning.
 
Posts: 76
Karma: 22
Join Date: Mar 2008
Location: uk
Device: Sony PRS505
pdf-meta now shows the correct author, but the title is still the filename minus the extension. e.g.

Quote:
megatron@elderthing:/cygdrive/y/resources/Books/pdf books $ pdf-meta.exe author\ -\ title.pdf
Title : author - title
Author : Unknown
Publisher: None
Category : None
Comments : None
ISBN : None

megatron@elderthing:/cygdrive/y/resources/Books/pdf books $ pdf-meta.exe author\ -\ title.pdf.new
Title : author - title.pdf
Author : author
Publisher: None
Category : None
Comments : None
ISBN : None
On the corrected PDF file, it looks suspiciously like pdf-meta is silently dropping the extension and treating the basename as the title - the metadata certainly doesn't show title as being "author - title.pdf" when I view it in Acrobat.
Megatron-UK is offline   Reply With Quote
Old 03-31-2008, 05:18 PM   #10
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,416
Karma: 4961459
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Attach one of these PDF files here
kovidgoyal is offline   Reply With Quote
Old 04-01-2008, 04:16 AM   #11
Megatron-UK
Connoisseur
Megatron-UK began at the beginning.
 
Posts: 76
Karma: 22
Join Date: Mar 2008
Location: uk
Device: Sony PRS505
Ok, will do that when I get back in from work.
Megatron-UK is offline   Reply With Quote
Old 04-01-2008, 01:25 PM   #12
Megatron-UK
Connoisseur
Megatron-UK began at the beginning.
 
Posts: 76
Karma: 22
Join Date: Mar 2008
Location: uk
Device: Sony PRS505
Ok, this is a version of Douglas Adams HHGTTG. Not a great version, but that's not relevant.

Original version

Quote:
megatron@elderthing:/cygdrive/y/resources/Books/pdf books/Douglas Adams $ pdf-meta.exe Douglas\ Adams\ -\ The\ Hitch\ Hikers\ Guide\ To\ The\ Galaxy.pdf
Title : Douglas Adams - The Hitch Hikers Guide To The Galaxy
Author : Unknown
Publisher: None
Category : None
Comments : None
ISBN : None
Corrected metadata version

Quote:
megatron@elderthing:/cygdrive/y/resources/Books/pdf books/Douglas Adams $ pdf-meta.exe Douglas\ Adams\ -\ The\ Hitch\ Hikers\ Guide\ To\ The\ Galaxy.pdf.new
Title : Douglas Adams - The Hitch Hikers Guide To The Galaxy.pdf
Author : Douglas Adams
Publisher: None
Category : None
Comments : None
ISBN : None
I had to put an extra ".pdf" on the end of the corrected version in order to upload it.
Megatron-UK is offline   Reply With Quote
Old 04-01-2008, 01:29 PM   #13
JSWolf
Resident Curmudgeon
JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.JSWolf ought to be getting tired of karma fortunes by now.
 
JSWolf's Avatar
 
Posts: 36,203
Karma: 17169472
Join Date: Nov 2006
Location: Roslindale, Massachusetts
Device: Sony Reader PRS-650, iPad, nook STR
I've had to remove the attachments as they are of a copywritten book. Please use the Libprs500 website's ticket system to attach them there.
JSWolf is offline   Reply With Quote
Old 04-01-2008, 02:20 PM   #14
Megatron-UK
Connoisseur
Megatron-UK began at the beginning.
 
Posts: 76
Karma: 22
Join Date: Mar 2008
Location: uk
Device: Sony PRS505
My apologies. Here's one that's now in the public domain. E.E Smith's 'Triplanetary'.

No metadata to start with. Metadata added with the following command:

Quote:
pdftk E.\ E.\ Doc\ Smith\ -\ Lensman\ 1\ -\ Triplanetary.pdf update_info metadata output E.\ E.\ Doc\ Smith\ -\ Lensman\ 1\ -\ Triplanetary_new.pdf
The metadata input file is trivial:

Quote:
megatron@curse:/export/Apps and Resources/resources/Books $ cat metadata
InfoKey: Author
InfoValue: Mr NotaRealName
InfoKey: Title
InfoValue: This is a test document for libprs500
Then check the metadata with pdf-meta:

Quote:
megatron@elderthing:/cygdrive/y/resources/Books $ pdf-meta.exe E.\ E.\ Doc\ Smith\ -\ Lensman\ 1\ -\ Triplanetary_new.pdf
Title : E. E. Doc Smith - Lensman 1 - Triplanetary_new
Author : Mr NotaRealName
Publisher: None
Category : None
Comments : None
ISBN : None
The Author is displayed correctly, but the Title should be "This is a test document for libprs500"... (as shown in the screengrab of Acrobat below). libprs500 therefore still displays the incorrect Title.
Attached Thumbnails
Click image for larger version

Name:	Clipboard03.jpg
Views:	229
Size:	27.5 KB
ID:	11926  
Attached Files
File Type: pdf E. E. Doc Smith - Lensman 1 - Triplanetary_new.pdf (538.8 KB, 594 views)
Megatron-UK is offline   Reply With Quote
Old 04-01-2008, 04:27 PM   #15
kovidgoyal
creator of calibre
kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.kovidgoyal ought to be getting tired of karma fortunes by now.
 
kovidgoyal's Avatar
 
Posts: 25,416
Karma: 4961459
Join Date: Oct 2006
Location: Mumbai, India
Device: Various
Fixed in svn
kovidgoyal is offline   Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
looking for a book title and author Joebill Reading Recommendations 16 05-23-2010 06:07 AM
Regex search author field to locate books? Starson17 Calibre 2 12-21-2009 10:40 AM
Author Plus Title Folders gargoyle67 Calibre 2 12-15-2009 05:07 PM
Changing from Title-Author to Author - Title Amalthia Calibre 15 09-22-2008 08:41 PM
libprs500 - Author Alphabetizing bingle Sony Reader 5 10-07-2007 08:05 PM


All times are GMT -4. The time now is 12:34 PM.


MobileRead.com is a privately owned, operated and funded community.