View Single Post
Old 08-21-2013, 12:00 AM   #1622
FaceDeer
Connoisseur
FaceDeer will become famous soon enoughFaceDeer will become famous soon enoughFaceDeer will become famous soon enoughFaceDeer will become famous soon enoughFaceDeer will become famous soon enoughFaceDeer will become famous soon enoughFaceDeer will become famous soon enough
 
Posts: 89
Karma: 706
Join Date: Nov 2012
Device: Kobo Touch
Quote:
Originally Posted by shesgottaread View Post
@FaceDeer, would you mind sharing the regexes needed to do this correcting of the author_sort? I'm trying to learn python, and have done some other programming, but regexes are something that have always given me problems.
This one's about the simplest possible one.

The goal of this regex is to match the entire string that's in the "author" field and output it unmodified into the "author_sort" field. The regex to match the source "author" field is:

(.*)

The dot means "match any character (that's not a line-end)" and the asterix after it modifies it to mean "match any number of this character". So it'll match a string of any length consisting of any characters. The brackets around them tell the regex engine that this group of matched characters is going to be referred to later, so it should keep track of it.

The regex to put in the destination "author_sort" field is:

\1

The backslash-one tells the regex engine "insert the group of characters that you matched with the first bracketed section of the source regex". Fancier regexes could have multiple bracketed sections, for example if you wanted to reverse the order of two chunks of something you could have bracketed sections match the first chunk and the second chunk and then output to a "\2 \1" regex. I suppose in theory one could come up with a fancy regex to parse the author_sort and reverse-engineer it back into its original form, but since we *have* its original form already sitting in the author column that's not necessary.

I'm still in the process of making a backup of my main fanfic archive so I haven't run this "for real" yet, but I did some test runs on a small test archive so I'm 99% sure this is correct. Nevertheless, make a backup of your own before you run this. Always a good idea when dealing with regexes, even ones this simple. They can be arcane.
FaceDeer is offline