View Single Post
Old 09-23-2010, 11:10 AM   #26
chaley
Grand Sorcerer
chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.chaley ought to be getting tired of karma fortunes by now.
 
Posts: 11,774
Karma: 7029857
Join Date: Jan 2010
Location: Notts, England
Device: Kobo Libra 2
Quote:
Originally Posted by Manichean View Post
So, if I understand you correctly, the regular expression would be something like the starting state, each matching string would be it's own (end?) state, and the matching operation would provide transitions between the end states, yes?
I don't think this is quite right.

First, some words on finite state machines (FSMs). An FSM consists of a set of states, connected by arcs that stand for events. Lets say that these events 'label' the arc. In the kind of machine we are talking about (deterministic), an event can appear on at most one arc out of a given state. Traversing an arc consumes the event. Events that are labels for an arc leaving a particular state cannot be accepted by that state, and their occurrence is an error. The machine is constructed in advance from the pattern, and cannot change depending upon input.

Thus the regular expression 'abc' would look something like

start __a__ state1 __b__ state2 __c__ accept

The regular expression for 'ab*c' would look something like

Code:

start __a__ state1 __b__ state2 __c__ accept
                         |_b_|
Quote:
And because the number of states a regular expression can describe is finite,
The number of states an FSM can *contain* is finite. The number of states an FSM can *describe* is infinite.
Quote:
and, in the general case, len(utterance) is unknown, presumed to be in the range [0,\infty], a palindrome could not be described using a regular expression?
Yes. But as we will see below, it is not the length that is the problem, it is the matching of the characters.
Quote:
If that's correct, I don't think I understand how the * and + quantifiers work, because they, as far as I understand, describe an utterance with len(utterance) \in [0,\infty] and len(utterance) \in [1,\infty], respectively.
Yes, they do. But they have no memory. There is no way to match in a general fashion (using one machine) the strings abba, abcba, and abbbba, all of which are palindromes. The problem is that if we don't know what to do until we get to the end and know what 'len' is. Once we know 'len', we can construct the right machine for that utterance. Alternatively, we can a stack, but that would most definitely not be an FSM, because the label on the arc would only be known during execution. In the first case, we can accept only 'a's after the second 'b', but in the second case we can accept a 'c' and in the third case another 'b'.

Of course, you can write a regexp if you know *all* the utterances that you must match. For example, the regexp 'abba|abcba|abbbba' works fine.

You might be interested in the paper at this link. It gets mathematical at times (lots of times), but it discusses how one translates REs used in searches to FSMs to really do the work.

Last edited by chaley; 09-23-2010 at 11:29 AM.
chaley is offline