View Single Post
Old 09-26-2008, 11:47 AM   #1
Azhad
Member
Azhad began at the beginning.
 
Posts: 23
Karma: 10
Join Date: Dec 2007
Location: Rome, Italy
Device: PRS-500, PRS-505, Milestone, Galaxy Tab
Regular Expression Help

Hi there

Here's my problem: I got a bunch of pdf files named like those examples:

Name Surname - Name of the Series 01 - Title of the Boook.pdf

or

Name Surname - Title of the Boook.pdf

For the first one I use this:

(?P<author>[^_]+) - (?P<series>[^_]+) (?P<series_index>[0-9]+) - (?P<title>.+)

And for the second example I use:
(?P<author>[^_]+) - (?P<title>.+)

The problem is that the parsing cut the last word, so the title result in "Title of the"

Anyway, is possible to join those 2 expression so the parsing understand when there's a series space in the filename or not ( xxx - xxx instead of xxx - xxx 3 - xxx) ?

The other problem I got is that calibre look inside the pdf for the title and author field, and sometime this result in some garbled text, is there a way to override this and use only the data parsed from the filename?

Thanks in advance for any advices.

P.S.
sorry for my subpar english
Azhad is offline