MobileRead Forums - View Single Post

Jozawun · 06-07-2012, 05:41 PM

Quote:

Originally Posted by chaley

Actually, your process has many more steps, as you must repeat them all for each word. You must manually scan to ensure that you found all the words. The time required will be worse because the same title might be changed many times, requiring a update of the calibre database and the file system (at minimum a rename of a folder) for each change. Finally, it won't work for leading or trailing words.

However, there are some advantages to your approach, specifically the avoidance of unintended changes. Using it in a variation of the regexp method will eliminate the time penalty, the multi-step problem, and the leading/trailing word problem. One is still required to manually scan the titles to build a correct list of words.

For example, you could use the following

Code:

((?<= )|^)(a|an|the|in|is|by)(?= |$)

for the search expression, and use \2 for the replacement expression.

The components of the regular expression are:

* ((?<= )|^) - This is the most complicated part of the expression. It says that whatever follows must be preceded by either a space or the beginning of the title. The part "(?<= )" means look backwards for a space but don't include it in the matched text. The "|" is an "or", so "(?<= )|^)" means "check for a space or beginning of line".
* (a|an|the|in|is|by) - this is the list of words to be changed, separated from each other by "or". Add as many words as you wish.
* (?= |$) - Check that the word is followed by a space or the end of the title, but do not include the space (if any) in the matched text. Not including the space in this match permits it to be matched again when checking the next word.

Actually, in my system, you only had to repeat steps 7, 8, 10; and there was no practical time penalty, because the whole process for all the listed words would take 5-10 minutes max for the 1500 books. Of course, leading words are not relevant (they have already been capitalized); and trailing uncapitalized words in book titles would be extremely rare.

I note your changes; but do you still have the major time penalty of having to check each of the 1500 book titles afterwards to catch the unintended "unCapitalizations"? If you are able to fix this, then your proposal would become more practical.

PS I'm sorry if the above sounds a bit grumpy, it wasn't meant to be. I personally have been impressed by and grateful for the work you've done on these forums - especially getting the books on to my 650 in a comprehensible order! I just think this proposal is not practical if you still have to check every book afterwords to find and manually correct the unintended changes.