Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Library Management

Notices

Reply
 
Thread Tools Search this Thread
Old 07-28-2011, 06:05 PM   #1
penguinaka
Quack! Quack!
penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.
 
penguinaka's Avatar
 
Posts: 92
Karma: 9176
Join Date: Apr 2011
Location: Florida
Device: kindle 3 & sony daily prs950sc
Question Regex - Library Order Fix?

I've hit a wall with this regex and i'm hoping someone might have the answer.

Goal:
1. Fix Library Order Titles back to Normal Title Order where the article has been thrown to the back of the title separated by a space and a comma
2. Ignore Titles that are groups of words separated by a comma that are not library order titles.
3. leave any bracketed info in place.
4. articles can be upper or lower case

Definitions:
Library Order = Titles that begin with an article have been swapped so that the article is on the tail end separted with a comma
Article = "A", "An", or "The" and is used in the beginning of sentence in a title.

Examples:

To Fix:
Before:

George Martin - Ice & Fire 01 - A Night of the Living Dead, An Unabridged Account(v1.0).epub <==ignored
George Martin - Ice & Fire 01 - Three Books Of Dread, a Westeros Novel.epub <==ignored
George Martin - Ice & Fire 01 - See Spot Run, Run Spot Run (v2).epub <==ignored
George Martin - Ice & Fire 01 - Game of Thrones, The (v5.0).epub
George Martin - Ice & Fire 01 - Game of Thrones, An (v3).epub
George Martin - Ice & Fire 01 - Game of Thrones, A (v1.0).epub
George Martin - Ice & Fire 01 - Game of Thrones, The.epub
George Martin - Ice & Fire 01 - Game of Thrones, An.epub
George Martin - Ice & Fire 01 - Game of Thrones, A.epub
George Martin - Game of Thrones, The (v5.0).epub
George Martin - Game of Thrones, An (v3).epub
George Martin - Game of Thrones, A (v1.0).epub
George Martin - Game of Thrones, The.epub
George Martin - Game of Thrones, An.epub
George Martin - Game of Thrones, A.epub

Fixed:
After:

George Martin - Ice & Fire 01 - A Night of the Living Dead, An Unabridged Account(v1.0).epub <==ignored
George Martin - Ice & Fire 01 - Three Books Of Dread, a Westeros Novel.epub <==ignored
George Martin - Ice & Fire 01 - See Spot Run, Run Spot Run (v2).epub <==ignored
George Martin - Ice & Fire 01 - The Game of Thrones (v5.0).epub
George Martin - Ice & Fire 01 - An Game of Thrones (v3).epub
George Martin - Ice & Fire 01 - A Game of Thrones.epub
George Martin - Ice & Fire 01 - The Game of Thrones.epub
George Martin - Ice & Fire 01 - An Game of Thrones.epub
George Martin - Ice & Fire 01 - A Game of Thrones.epub
George Martin - The Game of Thrones (v5.0).epub
George Martin - An Game of Thrones (v3).epub
George Martin - A Game of Thrones (v1.0).epub
George Martin - The Game of Thrones.epub
George Martin - An Game of Thrones.epub
George Martin - A Game of Thrones.epub

This is what i have been able to come up with but its not quite enough it can't differentiate between articles and sentances.

*note the renamer i'm using doesn't ingore extensions so the regex takes into account there is also an extension.

This does a swap between two groups in the title section bracketed info gets swapped as well.
It doesn't differentiate between articles that need to be swapped and a sentence with a comma that does not.
When it swaps it swaps everything on either side of the comma.
Code:
(.*-)(.*),(.*)(\..*)
\1\3\2\4
This does a swap between two groups in the title section bracketed info gets swapped as well.
It doesn't differentiate between articles that need to be swapped and a sentence with a comma that does not.
When it swaps it swaps everything on either side of the comma.
Code:
(.*) - ([^,]*)(?:, (.*))\.(.*)
\1 - \3 \2.\4
This does a swap between two groups in the title section and leaves bracketed info in place.
It doesn't differentiate between articles that need to be swapped and a sentance with a comma that does not.
If its a sentence after the comma it grabs the first word and swaps it and leaves the rest in place.
Code:
(.* - ){1,2}([\w ]+), (\w*)(.*)\.(\w+)
\1\3 \2\4.\5
any help would be appreciated thank you.
penguinaka is offline   Reply With Quote
Old 07-29-2011, 12:20 AM   #2
speakingtohe
Wizard
speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.speakingtohe ought to be getting tired of karma fortunes by now.
 
Posts: 4,812
Karma: 26912940
Join Date: Apr 2010
Device: sony PRS-T1 and T3, Kobo Mini and Aura HD, Tablet
Probably you know about the tweak that controls the formatting of titles and series, but just in case.

Control how title and series names are formatted when saving to disk/sending
to device. The behavior depends on the field being processed. If processing
title, then if this tweak is set to 'library_order', the title will be
replaced with title_sort. If it is set to 'strictly_alphabetic', then the
title will not be changed. If processing series, then if set to
'library_order', articles such as 'The' and 'An' will be moved to the end. If
set to 'strictly_alphabetic', the series will be sent without change.
For example, if the tweak is set to library_order, "The Lord of the Rings"
will become "Lord of the Rings, The". If the tweak is set to
strictly_alphabetic, it would remain "The Lord of the Rings".

Some other tweaks may help as well
Helen
speakingtohe is offline   Reply With Quote
Advert
Old 07-29-2011, 12:29 AM   #3
penguinaka
Quack! Quack!
penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.
 
penguinaka's Avatar
 
Posts: 92
Karma: 9176
Join Date: Apr 2011
Location: Florida
Device: kindle 3 & sony daily prs950sc
Quote:
Originally Posted by speakingtohe View Post
Probably you know about the tweak that controls the formatting of titles and series, but just in case.

Control how title and series names are formatted when saving to disk/sending
to device. The behavior depends on the field being processed. If processing
title, then if this tweak is set to 'library_order', the title will be
replaced with title_sort. If it is set to 'strictly_alphabetic', then the
title will not be changed. If processing series, then if set to
'library_order', articles such as 'The' and 'An' will be moved to the end. If
set to 'strictly_alphabetic', the series will be sent without change.
For example, if the tweak is set to library_order, "The Lord of the Rings"
will become "Lord of the Rings, The". If the tweak is set to
strictly_alphabetic, it would remain "The Lord of the Rings".

Some other tweaks may help as well
Helen
this is not a tweak question this is a straight out regex problem... i already know how that operates in calibre. thank you.
penguinaka is offline   Reply With Quote
Old 07-29-2011, 03:59 AM   #4
penguinaka
Quack! Quack!
penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.penguinaka can eat soup with a fork.
 
penguinaka's Avatar
 
Posts: 92
Karma: 9176
Join Date: Apr 2011
Location: Florida
Device: kindle 3 & sony daily prs950sc
finally figured out the solution...

Code:
(.* - )([^-\n]*), *(The|An?)((?: *\([^)\n]*\))*?\.)(\w+)
\1\3 \2\4\5\6
penguinaka is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Library order philandjan Library Management 17 03-10-2011 01:41 PM
regex to fix up hyphenated words please cybmole Sigil 2 01-06-2011 04:13 AM
Reader Library Won't Start - My temp fix... bretts Sony Reader 10 02-09-2010 01:14 PM
How to maybe fix Reader Library JSWolf Sony Reader 0 12-31-2009 02:42 PM


All times are GMT -4. The time now is 04:18 PM.


MobileRead.com is a privately owned, operated and funded community.