MobileRead Forums - View Single Post

garcle · 01-25-2011, 01:04 AM

Quote:

Originally Posted by user_none

... Nope. Having dealt with the header and footer regex stuff there isn't even anything close to one regex to rule them all. As far as content goes ebooks are formatted so differently that even books by the same author and in the same series often need different regexes.

Same goes to extracting metadata from file names that there so many different ways people have their books names there is no one regex that works most of the time. Each variance needs a different regex. While it might be nice to include defaults that work against a number of cases, I don't think throwing 20 different regexes at a user and saying pick one would help much. Especially when there is a good change none of them will work.

@User_none If you are going to quote me do me the courtesy of lifting my whole paragraph or at least showing ellipses ... to indicate that there was more.
<geek alert, for geeks only>

Your remarks miss the point I was making completely. I was not talking about "...one regex to rule them all ...", I didn't mention "... throwing 20 different regex's at a user and saying pick one...".
My point was completely misrepresented. I talked about Classes of regex, and I talked about hiding their ugliness completely from the user in the form of a simple plain English menu or other GUI (wizards for example) (maybe with example patterns from real cases, so people can look at it and say oh yeah that's like what I have). Maybe an architecture which included "meta-expressions" which forward scanned the text and determined what kind of expression is required without even the need for an elaborate menu, could be used.