Register Guidelines E-Books Today's Posts Search

Go Back   MobileRead Forums > E-Book Software > Calibre > Library Management

Notices

Reply
 
Thread Tools Search this Thread
Old 06-06-2012, 12:04 PM   #16
wydchr
Member
wydchr began at the beginning.
 
Posts: 13
Karma: 10
Join Date: Jun 2012
Location: UK
Device: Kindle
Thanks to both you guys for your help, maybe I ought to read the manual...

David.
wydchr is offline   Reply With Quote
Old 06-06-2012, 06:45 PM   #17
JustForFun
Enthusiast
JustForFun has learned how to read e-booksJustForFun has learned how to read e-booksJustForFun has learned how to read e-booksJustForFun has learned how to read e-booksJustForFun has learned how to read e-booksJustForFun has learned how to read e-booksJustForFun has learned how to read e-books
 
Posts: 30
Karma: 752
Join Date: Nov 2010
Device: PB360
Quote:
Originally Posted by wydchr View Post
Oh and one other thing, what exactly do the regular expression symbols mean and how would I have known to do that if I hadn't asked on the forum... not really important now I know but I am curious as to how it works.
Regular expressions can be a little intimidating at first but they are really useful. For more information about them see the tutorial in the Calibre manual.
JustForFun is offline   Reply With Quote
Old 06-06-2012, 07:26 PM   #18
Jozawun
Fanatic
Jozawun ought to be getting tired of karma fortunes by now.Jozawun ought to be getting tired of karma fortunes by now.Jozawun ought to be getting tired of karma fortunes by now.Jozawun ought to be getting tired of karma fortunes by now.Jozawun ought to be getting tired of karma fortunes by now.Jozawun ought to be getting tired of karma fortunes by now.Jozawun ought to be getting tired of karma fortunes by now.Jozawun ought to be getting tired of karma fortunes by now.Jozawun ought to be getting tired of karma fortunes by now.Jozawun ought to be getting tired of karma fortunes by now.Jozawun ought to be getting tired of karma fortunes by now.
 
Jozawun's Avatar
 
Posts: 519
Karma: 2693434
Join Date: Dec 2009
Location: Australia
Device: Cybook Gen 3, Pocketbook 902, Sony 650
You don't need to use regular expressions at all, there is a much simpler way

1. For example, to change "of" to "Of"
2. Select the books you want to alter
3. Select Edit Metadata
4. At the top of the Edit Metadata box, select Search and Replace
5. Tick the Case Sensitive box
6. In the Search Field box, tick "title" (not "title search"
7. In the Search for box, type (blank)of(blank) - don't forget the leading and trailing blank spaces
8. In the Replace with box, type (blank)Of(blank) - don't forget the leading and trailing blank spaces
9. Check the Test result box to see it's OK
10. Click OK
11. Then do the same with each relevant word, eg a, the, an, of, and, etc
12 It should take 5 minutes to do the lot, no matter how big your library is
Jozawun is offline   Reply With Quote
Old 06-07-2012, 12:06 AM   #19
DoctorOhh
US Navy, Retired
DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.DoctorOhh ought to be getting tired of karma fortunes by now.
 
DoctorOhh's Avatar
 
Posts: 9,897
Karma: 13806776
Join Date: Feb 2009
Location: North Carolina
Device: Icarus Illumina XL HD, Kindle PaperWhite SE 11th Gen
Quote:
Originally Posted by Jozawun View Post
You don't need to use regular expressions at all, there is a much simpler way
Since chaley provided a regular expression that capitalizes all words as requested how is this multiple step method a "much simpler way?"
DoctorOhh is offline   Reply With Quote
Old 06-07-2012, 12:55 AM   #20
Jozawun
Fanatic
Jozawun ought to be getting tired of karma fortunes by now.Jozawun ought to be getting tired of karma fortunes by now.Jozawun ought to be getting tired of karma fortunes by now.Jozawun ought to be getting tired of karma fortunes by now.Jozawun ought to be getting tired of karma fortunes by now.Jozawun ought to be getting tired of karma fortunes by now.Jozawun ought to be getting tired of karma fortunes by now.Jozawun ought to be getting tired of karma fortunes by now.Jozawun ought to be getting tired of karma fortunes by now.Jozawun ought to be getting tired of karma fortunes by now.Jozawun ought to be getting tired of karma fortunes by now.
 
Jozawun's Avatar
 
Posts: 519
Karma: 2693434
Join Date: Dec 2009
Location: Australia
Device: Cybook Gen 3, Pocketbook 902, Sony 650
Quote:
Originally Posted by dwanthny View Post
Since chaley provided a regular expression that capitalizes all words as requested how is this multiple step method a "much simpler way?"
It has the same number of steps, is much easier to understand (and type), targets the changes more precisely, and doesn't require checking each book for the undesired u/c to l/c changes that chaley adverted to.
Jozawun is offline   Reply With Quote
Old 06-07-2012, 03:23 AM   #21
wydchr
Member
wydchr began at the beginning.
 
Posts: 13
Karma: 10
Join Date: Jun 2012
Location: UK
Device: Kindle
I appreciate all the help provided to me with the question I asked and both methods work well. I am only just getting to grips with Calibre which it is a fine program, hopefully I will be able to sort out my own queries soon... as usual the answer is always RTFM stupid LOL.

Thanks again, David.
wydchr is offline   Reply With Quote
Old 06-07-2012, 04:46 AM   #22
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 12,525
Karma: 8065948
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Quote:
Originally Posted by Jozawun View Post
It has the same number of steps, is much easier to understand (and type), targets the changes more precisely, and doesn't require checking each book for the undesired u/c to l/c changes that chaley adverted to.
Actually, your process has many more steps, as you must repeat them all for each word. You must manually scan to ensure that you found all the words. The time required will be worse because the same title might be changed many times, requiring a update of the calibre database and the file system (at minimum a rename of a folder) for each change. Finally, it won't work for leading or trailing words.

However, there are some advantages to your approach, specifically the avoidance of unintended changes. Using it in a variation of the regexp method will eliminate the time penalty, the multi-step problem, and the leading/trailing word problem. One is still required to manually scan the titles to build a correct list of words.

For example, you could use the following
Code:
((?<= )|^)(a|an|the|in|is|by)(?= |$)
for the search expression, and use \2 for the replacement expression.

The components of the regular expression are:

* ((?<= )|^) - This is the most complicated part of the expression. It says that whatever follows must be preceded by either a space or the beginning of the title. The part "(?<= )" means look backwards for a space but don't include it in the matched text. The "|" is an "or", so "(?<= )|^)" means "check for a space or beginning of line".
* (a|an|the|in|is|by) - this is the list of words to be changed, separated from each other by "or". Add as many words as you wish.
* (?= |$) - Check that the word is followed by a space or the end of the title, but do not include the space (if any) in the matched text. Not including the space in this match permits it to be matched again when checking the next word.
chaley is offline   Reply With Quote
Old 06-07-2012, 04:41 PM   #23
Jozawun
Fanatic
Jozawun ought to be getting tired of karma fortunes by now.Jozawun ought to be getting tired of karma fortunes by now.Jozawun ought to be getting tired of karma fortunes by now.Jozawun ought to be getting tired of karma fortunes by now.Jozawun ought to be getting tired of karma fortunes by now.Jozawun ought to be getting tired of karma fortunes by now.Jozawun ought to be getting tired of karma fortunes by now.Jozawun ought to be getting tired of karma fortunes by now.Jozawun ought to be getting tired of karma fortunes by now.Jozawun ought to be getting tired of karma fortunes by now.Jozawun ought to be getting tired of karma fortunes by now.
 
Jozawun's Avatar
 
Posts: 519
Karma: 2693434
Join Date: Dec 2009
Location: Australia
Device: Cybook Gen 3, Pocketbook 902, Sony 650
Quote:
Originally Posted by chaley View Post
Actually, your process has many more steps, as you must repeat them all for each word. You must manually scan to ensure that you found all the words. The time required will be worse because the same title might be changed many times, requiring a update of the calibre database and the file system (at minimum a rename of a folder) for each change. Finally, it won't work for leading or trailing words.

However, there are some advantages to your approach, specifically the avoidance of unintended changes. Using it in a variation of the regexp method will eliminate the time penalty, the multi-step problem, and the leading/trailing word problem. One is still required to manually scan the titles to build a correct list of words.

For example, you could use the following
Code:
((?<= )|^)(a|an|the|in|is|by)(?= |$)
for the search expression, and use \2 for the replacement expression.

The components of the regular expression are:

* ((?<= )|^) - This is the most complicated part of the expression. It says that whatever follows must be preceded by either a space or the beginning of the title. The part "(?<= )" means look backwards for a space but don't include it in the matched text. The "|" is an "or", so "(?<= )|^)" means "check for a space or beginning of line".
* (a|an|the|in|is|by) - this is the list of words to be changed, separated from each other by "or". Add as many words as you wish.
* (?= |$) - Check that the word is followed by a space or the end of the title, but do not include the space (if any) in the matched text. Not including the space in this match permits it to be matched again when checking the next word.
Actually, in my system, you only had to repeat steps 7, 8, 10; and there was no practical time penalty, because the whole process for all the listed words would take 5-10 minutes max for the 1500 books. Of course, leading words are not relevant (they have already been capitalized); and trailing uncapitalized words in book titles would be extremely rare.

I note your changes; but do you still have the major time penalty of having to check each of the 1500 book titles afterwards to catch the unintended "unCapitalizations"? If you are able to fix this, then your proposal would become more practical.

PS I'm sorry if the above sounds a bit grumpy, it wasn't meant to be. I personally have been impressed by and grateful for the work you've done on these forums - especially getting the books on to my 650 in a comprehensible order! I just think this proposal is not practical if you still have to check every book afterwords to find and manually correct the unintended changes.

Last edited by Jozawun; 06-07-2012 at 04:58 PM. Reason: Adding PS
Jozawun is offline   Reply With Quote
Old 06-07-2012, 05:30 PM   #24
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 12,525
Karma: 8065948
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Quote:
Originally Posted by Jozawun View Post
I note your changes; but do you still have the major time penalty of having to check each of the 1500 book titles afterwards to catch the unintended "unCapitalizations"? If you are able to fix this, then your proposal would become more practical.

PS I'm sorry if the above sounds a bit grumpy, it wasn't meant to be. I personally have been impressed by and grateful for the work you've done on these forums - especially getting the books on to my 650 in a comprehensible order! I just think this proposal is not practical if you still have to check every book afterwords to find and manually correct the unintended changes.
My proposal has no more "unintended unCapitalizations" than yours does, it simply does all the changes at once instead of one at a time. And in any event, one must still must check to see if the list of words to fix is complete, that is that all the necessary words were included in the list.
chaley is offline   Reply With Quote
Old 06-07-2012, 05:41 PM   #25
Jozawun
Fanatic
Jozawun ought to be getting tired of karma fortunes by now.Jozawun ought to be getting tired of karma fortunes by now.Jozawun ought to be getting tired of karma fortunes by now.Jozawun ought to be getting tired of karma fortunes by now.Jozawun ought to be getting tired of karma fortunes by now.Jozawun ought to be getting tired of karma fortunes by now.Jozawun ought to be getting tired of karma fortunes by now.Jozawun ought to be getting tired of karma fortunes by now.Jozawun ought to be getting tired of karma fortunes by now.Jozawun ought to be getting tired of karma fortunes by now.Jozawun ought to be getting tired of karma fortunes by now.
 
Jozawun's Avatar
 
Posts: 519
Karma: 2693434
Join Date: Dec 2009
Location: Australia
Device: Cybook Gen 3, Pocketbook 902, Sony 650
Quote:
Originally Posted by chaley View Post
My proposal has no more "unintended unCapitalizations" than yours does, it simply does all the changes at once instead of one at a time. And in any event, one must still must check to see if the list of words to fix is complete, that is that all the necessary words were included in the list.
Mine doesn't have any unintended unCapitalizations.

But I thought you said of yours "- Uppercase letters in the middle of strings will be changed to lowercase. For example, IBM will become Ibm and iTunes will become Itunes. There are ways around this problem but the regular expression will become more complex than I want to deal with."
But if you fixed this major problem, I apologise.
Jozawun is offline   Reply With Quote
Old 06-08-2012, 02:18 AM   #26
wydchr
Member
wydchr began at the beginning.
 
Posts: 13
Karma: 10
Join Date: Jun 2012
Location: UK
Device: Kindle
Oooer, I didn't want to make this a fight but I will say that using the first regular expression that Charles gave me i.e. "(.*?)( |$)" replace with "\1\2" did change Agatha Christie's 'The ABC Murders' to 'The Abc Murders' - I haven't yet tried the latest version you sent ; "((?<= )|^)(a|an|the|in|is|by)(?= |$)" replace with "\2" so I don't know if that does the same... I shall try it and see.

Thanks again guys, David.
wydchr is offline   Reply With Quote
Old 06-08-2012, 02:49 AM   #27
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 12,525
Karma: 8065948
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Quote:
Originally Posted by wydchr View Post
I haven't yet tried the latest version you sent ; "((?<= )|^)(a|an|the|in|is|by)(?= |$)" replace with "\2" so I don't know if that does the same... I shall try it and see.
It doesn't. It will change only the words that are on the list, in this case "a", "an", "the", "in", "is", and "by".

You might want to try the regexp "((?<= )|^)([a-z]+)(?= |$)" (without the quotes) instead of the longer one containing the list of words. This new one will capitalize words containing only lower-case letters, eliminating the manual scan for words that must be capitalized. However, the words must contain only the letters a through z, so it won't capitalize words containing non-English letters such as é or ñ.
chaley is offline   Reply With Quote
Old 06-08-2012, 03:57 AM   #28
wydchr
Member
wydchr began at the beginning.
 
Posts: 13
Karma: 10
Join Date: Jun 2012
Location: UK
Device: Kindle
Thanks Charles, my request is certainly turning into a saga but I am grateful for everyone's help. I don't want to labour the point but getting back to 'The A.B.C. Murders', is there a way of capturing a string like A.b.c. and capitalising that. I am now only asking out of curiosity (not necessity) as I am keen to get to grips with regular expression searches so please don't feel compelled to drag this thread on any longer as my initial request is essentially solved.

David
wydchr is offline   Reply With Quote
Old 06-08-2012, 04:17 AM   #29
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 12,525
Karma: 8065948
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Quote:
Originally Posted by wydchr View Post
getting back to 'The A.B.C. Murders', is there a way of capturing a string like A.b.c. and capitalising that.
The regular expression "((?<= |\.|-)|^)([a-z]+)(?= |\.|-|$)" will capitalize any "word" separated by spaces, periods, or hyphens (actually dashes), in any combination. For example, using that regexp the string
iTunes is not built by IBM or by Alcatel-lucent but by apple with a.b.c
results in
iTunes Is Not Built By IBM Or By Alcatel-Lucent But By Apple With A.B.C
The change to the regexp is to enlarge the set of characters that must precede and follow a word to permit it to be a candidate for capitalization.

And you are welcome.

Edit: the following regexp is better, changing the alternation ("or") to a character class. I think it is easier to read. "((?<=[ \.-])|^)([a-z]+)(?=[ \.-]|$)"

Last edited by chaley; 06-08-2012 at 04:20 AM. Reason: Add another variant
chaley is offline   Reply With Quote
Old 06-08-2012, 04:47 AM   #30
wydchr
Member
wydchr began at the beginning.
 
Posts: 13
Karma: 10
Join Date: Jun 2012
Location: UK
Device: Kindle
Mmmmm, I've read the section in the Calibre manual 'All about using regular expressions' and I think it's going to take a while for me to get a handle on it, I suppose experimentation is the order of the day but only on a copy of my library LOL.

David.
wydchr is offline   Reply With Quote
Reply


Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Better Book Titles... Coops General Discussions 4 05-05-2011 01:52 AM
Book titles are different barth90 Calibre 10 05-18-2010 05:41 PM
Series and Book Titles jjansen Calibre 10 04-14-2010 12:14 PM
Unutterably Silly Book Titles We Would Like To See RWood Lounge 8 02-16-2009 11:57 AM
Book Titles dhbailey Sony Reader 7 03-12-2007 12:07 PM


All times are GMT -4. The time now is 08:41 AM.


MobileRead.com is a privately owned, operated and funded community.